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EVALUATION 


This  report  summarizes  the  development  of  a real-time  keyword 
recognition  system  witli  the  basic  objective  of  detecting  when  a keyword 
occurs  in  continuous  speecli  independent  of  speaker.  The  initial  speaker 
independent  results  (over  50  different  speakers)  of  about  85%  detection 
and  20-25  false  alarms  per  hour  are  very  promising.  The  ability  to  perform 
reliable  keyword  detection  would  greatly  enhance  the  ability  to  perform 
other  related  speech  signal  processing  tasks,  such  as  .speaker  identi- 
fication and  continuous  word  recognition. 

/ 

Captain,  LLSAF 
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ROBERT  A.  CURTIS, 
Project  Engineer 


SUMMARY 


This  report  summarizes  the  development  of  a 
spoken  key  word  recognition  system  developed  by  Dialog 
Systems,  Inc.  for  Rome  Air  Development  Center  under  | 

contract  F30602-75-C-0171 . The  basic  objectives  for 
the  system  were  the  ability  to  detect  one  or  more  key 
words  in  continuous  speech,  independent  of  language, 
speaker,  or  spoken  text.  The  development  was  to  be 
initially  performed  in  the  English  language  and  with 
the  effectiveness  goal  of  90?,  detection  of  key  words 
and  5 or  less  false  alarms  per  hour.  The  system  was 
also  required  to  work  over  telephones  lines  and  radio 
links  without  being  susceptible  to  spectral  equal- 
ization or  distortion.  The  resulting  device  could 
tlien  be  used  to  monitor  all  types  of  broadcast  mat- 
erial and  by  a proper  selection  of  key  words,  to 
determine  the  gist  of  the  real-time  or  recorded  speech. 

Key  word  spotting  differs  from  continuous  speech 
uiuierstanding  in  that  one  trains  a word  spotting  machine 
to  understand  one  or  a few  words  and  hopes  that  it  will 
reject  all  other  words  without  actually  liaving  to  under- 
stand any  of  them.  In  passing  from  simple  closed- 
vocabulary  recognition  to  word  spotting,  one  encounters 
the  following  major  problems: 
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1)  Tlie  relative  number  of  wroncj  choices  (potential 
false  alarms)  that  must  be  rejected  is  very  laryc  since 
the  ranqe  of  words  and  phrases  that  mig)it  be 

put  to  tlie  machine  is  unlimited. 

2)  There  is  no  known  reliable  metliod  of  scymcntinq  t)io 
continuous  speech  material  into  words  or  syllables  on  the 
basis  of  short  term  acoustic  cues. 

3)  Tlie  acoustic  description  of  a word  chanyes  with  the 
verbal  context  in  which  it  appears.  The  relative  timing 
and  duration  of  events  can  vary  radically;  the  phonetic 
character  of  sounds  at  the  begijining  and  end  of  the  word 
can  be  modified  strongly  by  the  preceding  and  following 
words;  and  whole  syllables  are  sometimes  omitted  or  other 
sounds  added. 


Dialog's  starting  point  in  this  program  was  a 
previously-developed  algorithm  which  recognizes  single 
words  spoken  in  isolation  with  high  talker-independent 
accuracy  over  unknown  telephone  lines.  Tfiis  algorithm 
was  extended  to  recognize  continuous  speech  by  detecting 
nalf-syilable  sized  utterances  in  sequence.  A high-speed 
digital  vector  arithmetic  processor  was  designed  and 
constructed  to  perform  the  lengthy  calculations  required 
in  real  time,  and  tlie  softwart.'  was  re-written  for  the 
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nevv'  mach’tie. 


After  the  vector  processor  was  fabricated  and 
progranuned,  the  key  word  recognition  effort  concentrated 
on  a 6-minute  script  consisting  of  a hypothetical  news 
broadcast,  which  included  the  word  "Kissinger"  in  four 
different  context,  a set  of  airline-to-ground  exchanges, 
and  a short  list  of  random  numbers.  About  77  renditions 
of  the  script  were  obtained  in  English,  3 in  Spanish,  2 in 
French,  and  one  in  Chinese,  all  over  the  telephone.  At 
about  the  time  a data  base  was  being  made  up  of  these  voices. 
Dialog  received  13  renditions  of  the  script  on  wide-band  tape 
recordings.  These  did  not  appear  to  be  compatible  with  the 
telephone  voices  so  the  main  effort  was  switched  to  these 
tapes.  A data  base  was  then  made  of  the  four  "Kissinger" 
renditions  from  each  of  nine  speakers  and  this  material 
became  the  subject  of  an  intense  development  effort.  Four 
voices  were  held  out  of  the  data  base  as  test  inputs. 

The  resulting  word  spotting  technique  was  capable  of 
operating  in  real  time  for  at  least  two  key  words  simul- 
taneously, and  obtained  an  effectiveness  of  90%-95%  detec- 
tion of  a single  key  word,  with  4-6  false  alarms  per  hour 
against  the  limited  number  of  tost  voices.  It  was  found 
that  the  limited  data  base  could  not  hold  this  accuracy 
against  a wider  population  of  test  voices;  therefore  the 
data  base  was  increased  to  41  voices  from  additional  wide 
band  tape  recordings.  Ten  test  voices,  not  included  in 
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the  data  base,  were  used  for  a series  of  runs  under  various 
test  conditions. 

Although  some  parts  of  the  original  isolated  word 
algorithm  have  yet  to  be  included  in  the  word  spotting  tests, 
the  results  to  date  are  good  for  a moderately  wide  pop- 
ulation of  talkers.  As  shown  in  Table  A the  system  could 
be  made  to  operate  in  the  untrained  mode  at  83%  detection 
of  key  words  -.nd  24  false  alarms  per  hour  or  70%  detection 
and  6 false  alarms  per  hour  depending  on  the  threshold 
settings  for  each  pattern.  By  training  the  machine  on  the 
first  rendition  of  the  key  word  the  characteristics  im- 
proved somewhat. 

Although  this  performance  is  only  marginally  ad- 
equate for  operational  equipment,  the  general  ca  ability 
of  this  technique  in  key  word  detection  appears  very 
promising  towards  meeting  its  objectives  with  further 
development . 

The  variability  of  target  word  character  and  of 
material  to  be  rejected  is  quite  large,  and  predictions 
based  on  experience  with  as  many  as  41  different  talkers 
are  subject  to  large  statistical  errors.  In  the  course 
of  this  project,  statistical  models  have  been  developed 
for  several  aspects  of  the  recognition  process.  These 
models,  discussed  in  a latter  section,  seem  to  shod  some 
light  on  the  problems  of  implementing  and  testing  im- 
proved speech  processing  algorithms. 
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I Experimental  Study  And  Analysis 


1.  Summary  of  the  Word  Spotting  Algorithm 

A block  diagram  of  the  major  processing  steps  of 
the  continuous  speech  recognition  procedure  is  shown  in 
Figure  1.  The  analog  speech  waveform  is  digitized  with 
12-bit  resolution  at  an  8 KHz  sampling  rate.  Spectrum 
frames  having  32  points  (129  Hz  spacing  between  sample 
frequencies)  are  computed  every  10  milliseconds  in  block 
A of  Figure  1.  The  spectrum  analysis  involves  taking  the 
cosine  transform  of  the  output  of  a hardware  autocorrelator 
at  each  desired  frequency.  The  correlator  is  provided  so 
that  alternate  predictive  coding  implementations  can  be 
used.  Smoothing  in  frequency  and  time  also  occurs  at  A, 
primarily  to  remove  abrupt  transitions  in  the  spectrum  and 
aliasing  with  the  pitch  period  of  the  voiced  portions  of 
the  speech. 

In  the  next  processing  steps  the  sequence  of  spectrum 
frames  is  normalized  and  enhanced  with  respect  to  frequency, 
amplitude  and  time.  The  first  step  is  to  accumulate  a gross 
picture  of  the  input  spectrum,  at  B,  which  is  then  used  to 
adapt  the  system  to  the  unknown  frequency  response  of  the 
communications  channel.  Provision  for  subtracting  background 
noise  can  be  made  at  this  point,  but  is  not  yet  implemented  in 
a real-time  system.  Since  the  channel  equalizer  uses 
received  speech  to  estimate  the  channel  response,  some  of 
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the  between-talker  variation  is  removed  by  the  equalizer. 

Bias  and  fluctuation  in  the  talker's  rate  of  articulation 
must  be  accounted  for  in  order  to  decide  which  frame  of 
a stored  reference  pattern  corresponds  to  which  frame  of 
the  incoming  speech.  Our  algorithm  uses  a psychologically- 
based  measure  of  "subjective  time"  computed  from  a weighted 
sum  of  the  time  derivatives  of  the  spectrum  elements. 

The  frames  are  sampled  at  equal  increments  of  subjective 
time  to  form  a test  pattern  of  three  samples;  every  input 
frame  is  the  starting  sample  of  a test  pattern.  This 
method  yields  rather  accurate  temporal  registration  between 
reference  and  test  patterns  over  intervals  of  about  half 
a spoken  syllable. 

In  block  F the  position  and  width  of  each  spectrum 
line  is  enhanced  by  limiting  high  and  low  amplitude  spectrum 
coefficients  with  a function  of  the  form  (l-x)/(l+x). 

The  center  of  this  transfer  function  rides  up  and  down 
with  the  average  power  level  and  the  function  is  approximately 
logarithmic  over  a range  of  about  10  to  1.  After  channel 
equalization,  the  peaks  and  valleys  of  the  spectrum  no  longer 
correspond  to  an  all-pole  rational  transfer  function  model  of 
the  talker's  vocal  tract,  and  it  appears  that  detailed 
measurement  of  the  height  of  a peak  or  the  depth  of  a valley 
adds  nothing  to  the  recognition  accuracy  of  the  system. 

The  chosen  limiting  function  strongly  resembles  a typical 
firing  rate  function  of  an  auditory  nerve. 
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Transformation  of  the  frequency  axis,  shown  at  E, 
is  not  yet  implemented.  Some  form  of  frequency  rescaling 


may  be  expected  to  improve  the  speaker-independent  recognition 
accuracy  if,  for  example,  a method  of  adapting  the  transformation 
to  the  voice  quality  without  explicit  training  can  be  found. 

Up  to  this  point  the  algorithm  has  paid  no  particular 
attention  to  the  content  of  the  input  signals.  Now,  however, 
having  normalized  and  enhanced  the  data  with  respect  to  its 
important  physical  dimensions,  the  system  makes  an 
effort  to  enhance  the  phonetic  content  of  the  spectrum  patterns. 
This  is  implemented  by  linearly  projecting  the  data  from 
the  frequency  domain  into  an  abstract  space  in  which  sounds 
belonging  to  different  phonetic  speech  classes  are  maximally 
separated.  The  32  spectrum  coefficients  from  the  three 
sampled  frames  comprise  a 96-element  vector  which  is  transformed 
by  matrix  multiplication  into  the  new  space.  The  coefficients 
of  the  transformation  matrix  are  constants  evaluated  by  factor 
analysis  of  labeled  training  data.  In  our  isolated  word 
recognition  algorithm  this  type  of  transformation  is  applied 
to  each  32-element  sample  frame  and  is  found  to  improve 
overall  accuracy  by  a factor  of  two  or  more.  It  has  not 
yet  been  implemented  in  the  continuous  speech  algorithm. 

An  important  potential  benefit  of  processing  several 
successive  frames  in  this  manner  is  that  the  cross-correlations 
of  all  frames  at  all  pairs  of  frequencies  are  taken  into  account. 
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The  input  pattern  is  now  considered  ready  to  be 
matched  against  a set  of  reference  templates,  using  the 
statistical  behavior  of  the  reference  data  to  determine 


the  closeness  of  fit  to  each  template.  From  a study 
of  a large  number  of  samples  we  have  concluded  that  the 
patterns  at  this  processing  stage  are  adequately  modeled 
by  Gaussian  distribution  functions.  To  detect  the  occurrence 
of  previously  learned  patterns  we  implement  a likelihood 
reciever  based  on  Gaussian  statistics.  Decision  thresholds 
for  the  resultant  likelihood  functions  are  calculated  from 
the  statistics  of  measured  likelihood  scores  of  labeled 
target  patterns. 

This  completes  the  recognition  procedure  for  intervals 
of  half  a syllable  of  speech.  To  recognize  a sequence  of 
patterns  in  the  word  spotting  task  we  set  the  decision 
thresholds  for  very  high  detection  probabilities  and  depend 
on  subsequently  applied  concatenation  rules  to  reject  false 
alarms.  The  sequence  of  detected  patterns  must  match  a list 
of  permissible  "spellings"  of  the  target  word,  and  in  addition 
each  detection  must  happen  within  prescribed  real  and  subjective 
time  bounds  relative  to  other  pattern  detections.  This  section 
of  the  algorithm  is  under  active  development,  and  is  expected 
to  change  significantly  as  more  experimental  work  is  completed. 

A detailed  flow  chart  of  the  algorithm  is  presented  in 


Chapter  II. 
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2. 


Statistics  of  the  feature  measurements 


From  the  point  of  view  of  a given  pattern  recognition 
algorithm,  departures  of  the  measured  input  features  from 
perfect  pattern  matches  arise  from  unknown  and  unpredictable 
sources  and  are  to  be  treated  as  random  variables.  If  this 
were  not  the  case,  one  could  show  that  closer  pattern  matches 
are  possible  by  implementing  an  improved  algorithm  prior  to 
the  pattern  matcher.  it  is  therefore  important  to  analyze 
the  sample  distribution  functions  of  the  measurements  in 
order  to  find  the  best  probabilistic  decision  procedures 
to  use  and  to  help  discover  new  deterministic  factors  that 
can  improve  the  recognition  algorithm. 

The  accompanying  figures  show  frequencv  of  occurrence 
histograms  for  individual  feature  measurements  (single 
coordinates  of  the  output  of  box  F or  box  G,  Figure  1), 
for  repeated  applications  of  a particular  speech  sound 
embedded  in  the  speech  of  many  different  talkers.  The  typical 
histogram.  Figure  2,  has  a nondescript  bell  curve  shape. 

A test  applied  to  the  top  curve  of  Figure  2,  for  example, 
indicates  that  the  probability  of  getting  a random  departure 
this  large  from  a fitted  normal  distrubution  is  0.7  (the 
total  number  of  samples  is  60). 


To  get  a more  precise  estimate  of  the  average  shape 


Figure  2.  Typical  distributions  of  speech  parameter  data. 
The  horizontal  axis  is  scaled  in  such  a way  that  the  mean 
value  is  centered  in  the  display  and  the  range  shown  is 
plus  and  minus  4 standard  deviations. 
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of  the  distributions  a composite  sample  frequency  histogram 

was  developed  by  summing  a large  number  of  the  histograms  | 

like  Figure  2 after  scaling  the  horizontal  axis  of  each  by  i 

) 

subtracting  its  mean  value  and  dividing  by  its  standard  | 

deviation.  Thus  if  all  the  distributions  were  rectangular, 
the  composite  would  be  rectangular,  etc.  The  resulting 
frequency  function.  Figure  3,  representing  some  230,000  events, 

makes  a very  close  fit  to  a Gaussian  curve  and  suggests  that  ! 

the  likelihood  processor  should  employ  a Gaussian  model  for 

the  distributions  of  pattern  data.  j 

i 

I 

It  is  not  immediately  clear  why  the  the  sample  ■ 

frequency  functions  have  a Gaussian  shape.  The  signal 
processing  itself  contains  some  averaging  which  by  the 
central  limit  theorem  would  tend  to  produce  normal  distributions. 

It  will  require  further  study  to  determine  whether  or  not 
the  observed  functions  are  actually  produced  as  an  artefact 
of  the  measurement  process.  In  any  event  some  of  the 
sample  distributions  have  had  a decidedly  non-Gaussian  form, 
and  these  unusual  distributions  will  be  discussed  in  more 
detail . 

Figure  4 shows  a distribution  having  a tall  peak 
with  some  broadly  distributed  data  to  the  left  of  it. 

This  histogram  represents  the  distribution  of  scaled 
spectral  power  (output  of  box  F,  Figure  1)  at  4 KHz  for 
the  initial  /n/  sound  in  the  word  /nine/.  Because  4KHz  is 
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Figure  3.  Composite  statistical  frequency  function  made 
by  summing  3,840  of  the  curves  illustrated  in  Figure  2 of 
the  distributions  of  various  measurements  within  phoneti- 
cally specific  classes. 

On  the  horizonta]  axis  the  class  intervals  are 
spaced  at  intervals  of  1/32  standard  deviation; in  the  other 
figures  the  intervals  are  1/4  standard  deviation.  Otlier- 
wise,  the  horizontal  scaling  is  the  same. 
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above  the  upper  cutoff  frequency  of  the  signal  aliasing 
filter  and  the  /n/  is  a relatively  low-amplitude  sound, 
the  computed  spectrum  was  noisy  except  at  the  format 
frequency  peaks  due  to  truncation  errors  in  the  analogue 
to  digital  converter.  The  curve  seems  to  be  a composite 
of  two  distributions  --first,  the  tall  spike  produced  by 
those  samples  having  sufficient  energy  to  yield  an  accurate 
spectrum,  and  second,  the  relatively  much  broader  dis- 
tribution of  low  energy  noisy  samples  caused  by  the 
logarithmic  scaling  tending  to  a large  negative  number  as 
the  amplitude  goes  to  zero.  This  is  a clear  case  of  a 
processing  artefact,  and  was  actually  not  noticed  until 
these  histograms  were  produced  and  examined.  On  in- 
stalling a higher  resolution  analogue  to  digital  con- 
verter distributions  of  this  kind  no  longer  occured . 

A second  type  of  strange  distribution  is  illus- 
trated by  several  examples  in  Figure  5.  The  chance  of 
getting  such  largo  departures  at  random  from  a normally 
distributed  population  ranges  from  about  10  **  to  less 

- 7 

than  10  . These  curves  show  two  distinct  peaks,  which 
turn  out  to  be  related  to  two  distinctly  different  pro- 
nunciations of  tlie  target  sounds.  The  bimodality  is 
eliminated  by  partitioning  the  sample  space  into  two  new 
classes  and  making  a separate  reference  pattern  for  each 
distinct  pronunciation. 

Figure  6 illustrates  another  processing  artefact. 
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Figure  5.  (clockwise  from  lower  left).  Examples  of  bi 
modal  frequency  histograms. 
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a difference  between  processed  telephone  speech  and  high 
fidelity  speech.  At  low  frequencies  the  relative  amplitude 
of  high  quality  speech  as  seen  by  the  computer  drops  so 
rapidly  that  the  equalization  software  is  apparently  unable 
to  compensate.  At  higher  frequencies  there  is  no  significant 
difference  between  corresponding  distributions  of  telephone 
and  high  fidelity  speech.  Actually  the  difference  was 
brought  about  by  the  presence  of  low  frequency  hum  and 
noise  in  all  the  recordings  used  for  the  telephone  data  base. 


Figure  6.  Distribution  of  observed  spectral  amplitude  at 
constant  frequency  for  a fixed  utterance  over  the  data  base 
of  high  quality  speech.  The  histograms  are  scaled  so  that 
the  mean  amplitude  for  telephone  speech  is  centered  hori- 
zontally in  each  display,  and  the  horizontal  width  of  each 
display  is  plus  and  minus  4 standard  deviations  referred 
to  telephone  speech. 

Left:  Typical  of  low  frequencies  (0  and  150  Hz. 

spectrum  terms). 

Right:  Typical  of  higher  frequencies. 
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3 . Experimental  data  and  probability  models 


Detection  probability 


For  one  pattern  x = (x  ,x  ) the  likelihood 

— 12  9 6 

function  relative  to  the  kth  reference  template  is 


96 

%=1 


In  + 


where  y.,  and  cr.,  are  the  mean  and  standard  deviation, 

'V  K 'L  K. 

respectively,  of  x^  given  that  x is  a sample  of  the  kth 
speech  sound  class.  The  model  is  Gaussian  and  assumes 
uncorrelated  variables,  so  X ideally  is  the  sum  of  96 
independent  normal  deviates  and  has  a distribution. 

The  situation  is  complicated  by  the  fact  that  we  use 
sample  estimates  of  the  means  and  variances;  this  produces 
an  "offset  x^"  function.  In  addition  the  x.  are  of  course 
not  quite  independent.  However  because  the  number  of  degrees 
of  freedom  is  large  X^ (x)  should  have  an  approximately 
normal  distribution.  Figure  7 shows  sample  detection 
functions  for  the  training  sets  of  several  templates,  plotted 
on  a probability  scale  in  which  a normal  distribution  would 
be  represented  by  a straight  line  with  slope  1. 


Figure  8 illustrates  a case  in  which  the  joint 
detection  and  false  alarm  probabilities  for  two  speech  patterns 
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DETECTION  THRESHOLD,  RELATIVE  TO  SAMPLE  MEAN 

Figure  7. 

Detection  curves  for  several  96-element 
speech  patterns. 
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behave  as  if  the  events  were  independent.  The  curve  labeled 
SYL31  represents  a sequence  of  spectrum  frames  beginning  in 
the  initial  /th/  of  the  word  /three/  and  ending  in  the  /r/. 

The  curve  labeled  SYL32  is  for  the  sound  which  ends  in  the  /i/ 
of  /three/  and  begins  after  the  / r/  in  an  /!/  sound.  The 
curve  labeled  JOINT  is  the  joint  ROC  curve  for  detection 
of  both  sound  patterns  in  correct  temporal  order.  There  are 
31  targets  and  about  290  false  alarms  possible  in  each  case. 
The  joint  ROC  curve  is  predicted  rather  accurately  by  simply 
multiplying  corresponding  detection  and  false  alarm 
probabilities  from  SYL31  and  SYL32  to  get  a predicted  point 
on  the  joint  ROC  curve.  This  apparently  i ■'.dependent 
behavior  holds  frequently  for  detection  statistics  but 
rarely  for  false  alarm  statistics.  A more  general  model 
for  false  alarm  events  is  discussed  in  the  next  section. 

In  the  case  of  a closed  vocabulary  set  in  v/hich 
templates  for  all  received  sounds  are  available  the  situation 
is  somewhat  different.  This  case  is  worth  discussing  because 
as  we  increase  the  number  of  reference  patterns  we  attain 
an  increasingly  fine  covering  of  the  space  of  all  speech 
sounds.  Ordinarily  the  likelihood  ^^(x|x  is  in  class  k) 
is  positively  correlated  with  the  other  likelihood  functions 
X (x)  of  the  same  argument.  Nevertheless,  experience  with 

V 

closed  vocabularies  indicates  that  if 

X (x)  = ^ X . (x)  } , X in  class  k 

m - ,7  - - 
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PROBABILITY  OF  FALSE  ALARM 


Figure  8.  ROC  curve  for  detection  of  the  word  /three/ 
in  an  environment  of  isolated  digits  using  the  continuous 
speech  algorithm.  The  curves  are  derived  from  32  male 
voices  recorded  over  a variety  of  standard  telephone 
connections . 


then 


A^(x)  = 

has  a nearly  Gaussian  distribution.  The  ROC  curve  is 
then  readily  computed  from  the  distribution  of  A,  as 
shown  in  Figure  9.  While  this  is  a maximum  likelihood 
decision  strategy  it  differs  from  the  currently  implemented 
word  spotting  strategy  in  that  the  decision  thresholds 
are  not  fixed.  This  permits  the  detection  rate  to  be 
much  higher  without  greatly  increasing  the  false  alarm 
rate . 
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Fijure  0.  roc  curves  for  a qua'^i  f orcod-cho ico  decision  rule 
Experimental  data  ar‘  I'lotted  for  a selected  8-word  vocabulai 
the  Ifl  digits  (r ) . and  a 34-word  vocabnlarv  (H)  in  a dit;crr>t(' 
word  recognition  task  '.’ith  tclepi'.or.e  s:  eoch  and  many  talkers. 


Probabilistic  false  alarm  model 


Because  an  arbitrarily  chosen  pattern  might  occur 
anywhere  in  connected  conversation  it  will  be  assumed  that 
false  detections  of  a single  pattern  are  uniformly  distributed 
in  time.  If  the  time  axis  is  partitioned  into  short  intervals 
of  equal  duration,  the  probability 

Pr{A  is  in  the  kth  interval}  = P^ 

associated  with  pattern  A is  the  same  for  all  values  of  k. 

If  the  events  {A  is  in  the  k th  interval),  {A  is  in  the 

1 

k th  interval},  ...  are  independent  for  all  sets  of  intervals, 
2 

then  the  probability  that  pattern  A will  not  be  detected 
in  any  of  T intervals  is 

Pr{A  is  not  in  interval  k or  k or  ...  or  k„} 

-XT  12  r 

T’  A 

= (1-Pa)‘  = e ^ , 

where  X H -Z.n(l-P.).  The  expected  number  of  detections 

M M 

in  T intervals  is  P^T,  and  if  P^  is  small  then  P^  approaches 
and  the  distribution  of  the  number  of  detections  in  T 
intervals  becomes  Poisson  with  mean  value  k . 

The  joint  detection  of  two  patterns  A and  is  a 
coincidence  of  the  events  (A  is  in  the  kth  interval}  - (A) 
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and  {B  is  in  the  j th  interval}  = {B}.  If  the  events  were 
independent  their  joint  detection  probability  would  be 

Pr{A  and  B}  = P.P„, 

A D 

and  the  expected  number  of  coincidences  in  T time  intervals 
would  be  P^PgT.  Experiment  shows,  however,  that  this  estimate 
is  too  small,  typically  by  a factor  of  2 or  3.  To  arrive 
at  a closer  estimate  we  must  admit  that  the  events  are 
not  independent  and  consider  the  conditional  probability 

Pr{B|A}  = Pr{A  and  B}. 

Pr{A} 

The  experimental  result  noted  above  suggests  that  we  try 
to  put 


Pr{ B I A } = a Pr{ B} , (1) 

with  the  parameter  a between  2 and  3.  This  formula  asserts 
that  if  pattern  A has  been  detected,  then  pattern  B is  a 
times  more  likely  to  be  detected  in  the  specified  coincident 
time  interval  than  in  a randomly  chosen  interval. 

To  extend  the  model  to  cover  multiple  events  without 
introducing  additional  parameters,  we  will  suppose  that  the 
conditional  probability  of  the  nth  event  in  a sequence  depends  on 


the  immediately  preceding  event  sought,  but  not  on  any 
of  the  earlier  events.  The  first  event  is  uniformly  and 
independently  distributed  in  time  as  before.  Thus 

Pr{A  I A and  A and  ...  and  A 

n'  1 2 n-1 

= Pr{A  I A , } = a Pr{A  } . (2) 

n n-1  n ' 

Under  these  assumptions  the  probability  P^  of  jointly 
detecting  n specified  patterns  in  n associated  coincidence 
intervals  is 

P = Pr{A  and  A and  ...  and  A } 
n 1 2 n 

= Pr{A  } Pr{A  } Pr{A  },  (3) 

12  n ' 

and  the  expected  number  of  joint  detections  in  T intervals 
associated  with  A is  P T.  Thus  our  model  is  of  joint 
false  alarms  as  Markov  chains. 

Since  the  unconditional  probabilities  Pr{A^}  vary 
from  pattern  to  pattern,  the  predicted  results  for  joint 
detections  of  several  patterns  do  not  lie  on  a simple  curve. 
Table  I,  however,  permits  a numerical  comparison  of  the 
theoretical  model  with  experiment.  The  unconditional 
probabilities  for  four  speech  patterns  are  tabulated  on  the 
left;  these  patterns  are  extracted  from  the  beginning  and 
end  of  the  words  /one/  and  /six/  in  contexts  of  continuous 
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?a 


Observed  unconditional  Joint  false  alarm 

probability  Pr{A.}  of  rate  in  4.44  minutes  of 

false  detection  ^in  an  speech,  predicted  by  Observed  false 

interval  0,17  second  Event:  equation  (3)  with  a=3.0;  alarm  rate: 
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digits.  Detection  thresholds  and  coincidence  intervals  were 
adjusted  for  >90%  correct  detection  of  the  multiple  event 
{a  ,A  ,A  ,A  } which  represents  the  phrase  /one  six/  (part 

12  3 4 

of  a spoken  telephone  number) . The  false  alarm  data  were 
gathered  from  a set  of  two-second  segments  taken  at  random  from 
recordings  of  15  male  subjects  reading  a script  over  various 
switched  telephone  network  connections.  The  false  alarm  data 
did  not  contain  any  spoken  num}iers;  but  the  numbers  for 
correct  detection  were  taken  from  different  portions  of  the 
same  recorded  scripts.  Four  increasingly  long  joint  events 
were  set  up  in  the  form  of  computer  parameters  and 
the  false  alarm  data  base  of  4.4  minutes'  cumulative  duration 
was  searched  for  possible  false  alarms.  It  can  be  seen  from 
the  table  that  the  predicted  and  observed  numbers  of  false 
alarms  compare  favorably. 

Equation  (3)  has  a serious  flaw,  because  if  the 
individual  event  probabilities  are  large  the  predicted  joint 
probability  can  be  greater  than  1.  This  trouble  can  be 
handled  readily  by  assuming  that  the  conditional  events 
{b|a},  like  {A  I,  obey  Poisson  statistics.  Then  if  we 
put 

•\(A  = ^ S' 

we  are  asserting  that  the  Poisson  rate  function  X increases 
by  the  factor  a whenever  pattern  A occurs. 
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Under  this  assumption  equation  (3)  becomes 


n -aX  . 

P = Pr{A  } JX  (1  - e 
” 1 i = 2 


(4] 


where  X.  is  the  rate  function  associated  with  the  event  A..  ^ 

■z-  ^ -1 

a 

For  comparison  purposes  equation  (4)  was  used  to  generate 
predicted  false  alarm  rates  given  in  Table  II,  which  relates 
to  an  experiment  similar  to  the  one  described  above  but 
with  generally  more  diffuse  likelihood  functions. 


31 


d’ 


+J  (T3 
C U 

o e 


w 

4-1 

C 

(D 

E 

QJ 


■n  >-i 

O' 

P 

(0 

c 

, — . 

3 

'C  i-H 

-3- 

-3- 

X X 

in 

(u  ta 

CM 

vD 

c u 

p 

ta  p 

> 

x;  d la 

ta  Cr 

d 0 

^ 0) 

X E a p c 

E X 

0)  tn 

X X tn 

X X 

in  r-l 

S p 

X u 

S in 

XI  la 

d d 

X ta 

ta  d 

O 

in  ax: 

p a 

V'l  ‘H 

d X X 

ta  in 

X 

X d 

— 

d X 

la  X 

p 

X X 

p in  ta 

O d 

X X 

E >.x: 

X x; 

cN  tn 

X 

M X]  -P 

E X X 

0 n3 

(8  -H 

C E4 

E X 

0 X 

la  c 

0 u 

o 

m 0) 

X *H 

p 

C p 

X -- 

la  • 

X d 

o a 

d U -a- 

---  M 

X 

•H 

tn  -H  ^ 

d 'S' 

■r!  Eh 

tn  c 

.-1  T) 

OO 

LA 

LA 

in  d 

d 

P o 

la  d c 

• 

• 

• 

0 

X X 

u 

d X 

44  P 0 

-T 

o 

CA 

— 

la  TJ  X 

3 • 

a X 

Qj  *H  •• 

CM 

cx> 

X c la 

xs  tn 

tn  u 

4J  4-1  ^ 

la  Eh 

d X 

X d 

c d la  • 

TD 

P X 

aa  X 

■r4  4-)  3 rvj 

d --  c 

c 

d 

0 m cr  II 

> m X 

in  3 

d 'O 

• 

i-J  P d 3 

p '-H 

ta 

> 

aa 

d 83 

5 fN 

X d 

d 

tn  tn  d 

X 

X tn 

tn 

X C X 

\S'. 

u X 

ta 

OOP 

d 0 

d ta 

d 

E >.x: 

X 0 

c 4J  X X 

p 

P X!  X 

X X a 

ta 

X 

u 

3 -H 

Odd 

p d 

d d 

c 

X T)  s 

3 P 

X E 

X 

•H 

la  d 

in  cr 

•H 

d X 

X ^ 

C d d 

d X 

X 

tn 

0)  O ro 

0 c 

rH 

X X 

X 

in  -H  ^ 

LA 

CA 

vC 

\D 

tn  > 0 

a d 

ta 

c 

X -o 

• 

• 

• 

• 

X X 

E > 

d X 

d 

m d c 

LA 

-3* 

CM 

p d 

ta  X 

tn  X 

> 

X P 0 

CM 

LA 

la  d X 

tn  X 

ta 

d 

Oa  x •• 

aT!  X 

u 

d X 

4-1  X 

E ta 

X d 

P X 

c 

c d la  • 

0 E E 

U -X 

U 3 

p 

•H  X 3 fN 

U 0 

d X 

c tn 

d 

0 <a  cr  II 

tn  p 

d 3 

X d 

X 

P d 3 

C X 

a tn 

p 

X 

• o 

tn 

o 

ta 

M X aa 

X 

X d 

a 

M X d 

c o 

X 

U P 

d 

aa  X 

d 

j- 

d X d 

d tn 

d 

rH 

< 

X T3  X 

S X 

aa  X 

D> 

r-- 

•k 

X d X 

X X 

C X 

C 

tTi 

fO 

<a  p X 

d c 

d X 

• rM 

< 

< 

Eh  atJ 

X 3 

X s 

in 

r-*~> 

r-^ 

•* 

•• 

<Nl 

JT 

CJ 

CM 

X 

< 

< 

< 

< 

c 

•• 

- 

- 

d 

m 

,-1 

> 

< 

< 

< 

< 

w 

32 


Variance  of  estimated  detection  probability 


Each  of  N talkers  makes  n trials  of  a target  word. 

We  assume  that  the  fth  talker  has  a certain  detection 
probability  for  the  target  word,  and  that  the  trials 

are  independent.  Then  the  number  x.  of  detections  has 
a binomial  distribution  with  mean  value 

E{x,.  |p^ } = ^;p^ 

and  variance 

From  the  total  set  of  nU  trials  we  estimate  the  detection 
probability  for  the  whole  population  of  talkers,  p,  from 

X = ) X . , p = — (3) 

t = l 

and  we  want  to  know  the  variance  of  the  estimated  probability 
P ' 

= E{p^}  - E{p)^  = — fE{xM-E{xl^]  . (4) 

P ,;2_y2  I 

The  expectations  in  (4)  are  E {x }=/VE {x,.  ) , E {x^  }=jV^E  (xp 
the  latter  because  the  x^.  are  uncorrelated.  Wo  find 
E{x^}  by  integrating  the  conditional  expectation  as  follows: 


(1) 


(2) 
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where  /q (p)  is  the  probability  density  function  of  talkers' 
detection  probabilities  --  i.e.,  the  relative  probability 
of  finding  a talker  whose  detection  rate  is  p. 


Prom  (1)  and  (5) , 

E{x^}  = jnp^  /p.(Pi)  (6) 

which  gives  one  of  the  terms  needed  in  (4) . To  get  the 
other  term  we  first  compute 


34 


The  first  term  in  (9)  is  the  contribution  of  the 
population  variance  on  the  assumption  that  the  detection 
probability  of  each  talker  is  accurately  known;  for  large 
n,  equation  (9)  approaches  the  result  that  would  be  obtained 
by  direct  application  of  the  central  limit  theorem.  On  the 
other  hand,  if  r.=l  the  result  has  the  same  form  as  the 
variance  of  the  binomially  distributed  p.  From  (2),  the 
variance  of  ^ given  that  n=l  and  P^=E{p^.  } is  equal  to 


Thus 


a? 


w=l  • 


(11) 


If  the  total  number  nN  of  trials  is  fixed,  equation  (9) 
shows  that  the  best  experimental  result  should  be  obtained 
by  taking  one  trial  from  each  talker  so  as  to  maximize  the 
number  of  talkers.  If  the  number  of  talkers  is  fixed 
then  the  number  of  trials  per  talker  should  be  as  large 
as  possible. 


Since  0 £ p^  £ 1,  we  can  get  a worst  case  bound  on 
both  and  the  distribution  of  p.  The  worst  case  is  the 

one  in  which  the  density  function  of  p^  is  concentrated  at 
p . = 1 with  probability  p = E{p.}  and  at  p . = 0 with  probability 

i-  “2' 
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1-p.  Then  = p(l-p).  Setting  this  into  equation  (9) 
as  an  upper  bound  yields 


p - iV  always, 


(11) 


In  the  event  that  the  p^  actually  assume  the  worst  case 
distribution,  the  distribution  of  Np  is  binomial  with 
mean  IVp  and  variance  ^p(l-p),  reflecting  the  fact  that 
if  p_.  is  zero  or  one  there  is  nothing  to  be  learned  by 
taking  more  than  one  sample  per  subject.  A more  optimistic 
case  is  considered  in  the  section  on  confidence  limits. 


The  sample  variance 


, N fx . -j  2 

= I I 4 - p 


computed  from  the  observed  values  x^  is  an  unbiased  estimator 

of  iVaJ/(iV-l).  If  the  original  population  is  normally 
P 

distributed,  then  t = p(N-l)  Vs  is  a random  variable 
having  a Student's  t distribution  with  N-1  degrees  of 
freedom.  This  statistic  can  be  used  to  find  confidence 
intervals  for  p based  on  observations  from  one  experiment. 
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Confidence  limits  on  the  sample  statistics 


A lower  c-percent  confidence  limit  on,  for  example, 
the  detection  probability  is  found  by  computing  an  assumed 
value  p (^)  for  E{p.}  which  is  so  low  that  the  observed 

C I' 

value  of  p lies  in  the  c-th  percentile  of  the  distribution 
function  of  p.  Then  we  can  bet  that  on  repeated  runs 
of  the  experiment  the  interval  [p^(^),l]  will  contain 
E{p^}  c percent  of  the  time. 

To  make  this  calculation  it  is  necessary  to  assume 
a distribution  function  for  p.  In  the  worst  case,  described 
by  equation  (11)  , it  is  binomial  with  number  of  trials  N 
and  probability  p^(^).  Note  that  this  case  applies 
exactly  if  n = 1 trial  per  subject.  A table  of  binomial 
confidence  limits  is  included  in  the  Appendix  to  this  report. 

A less  pessimistic  estimate  based  on  some  experience 
results  from  supposing  that  the  true  population  density 
of  detection  probabilities  p^  is  J-shaped,  roughly  exponential 
with  the  most  likely  p^  near  1 and  a long  tail  extending  to 
low  probabilities.  For  such  a function  the  variance  of  p^ 
is  approximately  = (1-p)^.  Invoking  the  central  limit 

theorem  we  assume  that  p is  normally  distributed.  For  a 
particular  value  of  c we  will  want  p to  be  some  number  t 
of  standard  deviations  above  the  mean  value  p^.  Evaluating 
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the  standard  deviation  by  substituting  (1-p)^  into 
equation  (9)  we  find  that  satisfies 

t[  (n-1)  (1-P^)  + 
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Overall  Performance  Of  The  Algorithm 


r 


Forty-One  male  subjects  made  recordings  of  a 
six-minute  script  which  were  processed  to  generate  seven 
syllable-like  spectral  reference  patterns  for  the  target 
word  /Kissinger/.  Each  pattern  had  an  alternate,  to 
accommodate  variant  pronunciations.  Any  sequence  of 
pattern  detections  satisfying  the  temporal  windowing 
rules  was  counted  as  a detection  of  the  target  word. 
Detection  of  either  alternate  was  permitted,  regardless 
of  which  alternates  for  other  syllables  were  detected. 
Recordings  of  ten  additional  male  subjects  were  played 
into  the  system  to  test  the  detection  and  false  alarm 
rates  at  various  pattern  detection  threshold  settings. 

All  recordings  were  made  with  high-quality  microphones 
in  a relatively  quiet  environment.  This  material  was  used 
for  a benchmark  demonstration  witnessed  by  RADC  personnel 
at  the  end  of  the  contract  period. 

Much  of  the  development  had  been  carried  out  using 
a subset  of  9 subjects  for  training  and  4 subjects  for  test 
data  bases;  scores  of  90%-95%  detection  at  4 to  6 false 
alarms  per  hour  were  typical  for  this  smaller  data  base. 
With  the  larger  number  of  training  and  test  subjects,  the 
average  scores  wore  considerably  lower,  as  seen  in  Table 
III  and  Figure  10. 
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Fiaure  1^.  ROC  curves  for  the  continuous  speech 
word  spottincT  algorithm  derived  from  the  data  in 
Table  III. 


Earlier  tests  of  a 25-subject  telephone  speech  data  base 
yielded  intermediate  scores,  consistent  with  the  idea  that 
the  performance  of  the  system  is  a function  of  the  number 
of  different  talkers  used  to  compile  the  reference  patterns. 
Several  other  phrases  and  data  bases  of  female  voices  were 
tested  with  generally  similar  results.  The  numerical 
results  presented  here  are  the  only  ones  generated  by  the 
algorithm  as  described  with  enough  data  taken  under  compar- 
able conditions  to  produce  an  ROC  curve.  It  can  be  seen 
in  Table  III  that  a minority  of  the  talkers  tend  to  produce 
most  of  the  false  alarms.  This  suggests  that  improved 
performance  might  be  attained  whenever  it  is  possible  to 
select  talkers  or,  as  suggested  by  the  earlier  results 
with  small  populations,  limit  the  numl^er  of  talkers  to  be 
recognized.  Figure  10  was  produced  from  the  data  in  Table 
III  by  figuring  the  false  rate  per  0.5  second--  the  largest 
duration  of  the  target  word  observed.  At  the  low  end  of 
the  curve  with  2 or  3 false  alarms  per  hour,  the  error  of 
measurement  is  large  on  a relative  scale;  nevertheless 
the  fit  to  an  unbiased  ROC  curve  for  normally  distributed 
data  is  good. 

Under  the  same  conditions  the  closest-f itting 
reference  patterns  for  each  half-syllable  were  further 
trained  to  each  talker's  voice  by  averaging  the  mean 
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value  statistics  with  the  observed  spectral  data  for  the 
first  instance  of  the  target  word  in  the  script;  then  the 
detections  and  false  alarms  were  counted  for  the  talker's 
reading  of  the  entire  script,  discounting  the  single 
training  word.  The  observed  data  were  given  a weight 
of  0.75  in  computing  the  averaged  mean  value  parameters, 
while  the  previous  values  of  the  parameters  were  given 
a weight  of  0.25.  The  variance  parameter  was  left 
unchanged,  as  were  the  parameters  of  the  temporal 
window  masks.  The  same  set  of  pattern  detection 
thresholds  for  the  likelihood  functions  was  used; 
thus  a new  ROC  curve  was  generated  under  conditions 
which  corresponded  exactly  with  the  conditions  of  the 
first  curve,  except  for  the  training  of  the  mean  value 
parameters  (top  curve  of  Figure  10) . We  expected  the 
detection  rate  to  increase  greatly,  since  the  patterns 
were  now  tuned  to  the  talker;  for  the  same  reason  the 
false  alarm  rate  should  have  remained  the  same  or 
increased.  Surprisingly,  the  average  detection  rate 
increased  only  slightly,  while  the  false  alarm  rate 
significantly  decreased.  This  eifect  remains  unexplained, 
though  wo  suspect  that  an  interaction  between  the  pattern 
likelihood  detector  settings  and  the  optimum  parameters 
for  the  subsequently  applied  concatenation  rules  is 
responsible . 
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Applying  the  confidence  level  methods  to  the  ex- 
perimental results,  we  had  for  the  small  data  base  one 
missed  detection  in  a total  of  four  trials  for  each  of 
four  subjects,  and  6 false  alarms  per  hour.  The  ex- 
perimental detection  rate  is  therefore  0.94  and  the 
sample  standard  deviation  S = 0.108.  The  student's  t 
distribution  method  gives  a lower  90%  confidence  level 
Pc  = 0.84.  The  non-parametric  method  with  the  binomial 
distribution  yields  Pc  = 0.50.  The  presumably  more 
valid  results  from  Table  III  at  6 false  alarms  per  hour 
is  0.70.  The  confidence  levels  for  this  latter  prob- 
ability derived  from  10  talkers  are  0.57  for  the  t dis- 
tribution method  and  0.45  for  the  non-parametric  method. 

Further  development  effort,  particularly  on  the 
concatenation  rules  and  implementation  of  the  remaining 
pieces  of  the  origanally  proposed  algorithm,  may  be  ex- 
pected to  improve  the  performance  figures.  The  sharp 
difference  in  performance  between  small  and  large  pop- 
ulations serves  as  a warning  that  large  data  bases  are 
needed  to  assess  talker-independent  performance;  the 
probability  models  we  have  presented  underscore  this 

requirement.  These  models  also  suggest  that  im-  j 

provement  can  be  made  by  setting  the  pattern  detection 
in  a forced-choice  decision  context  and  by  revising 
the  pattern  concatenation  rules  to  incorporate 
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a priori  information  on  the  probabilities  of  alternate 
syllable  "spellings"  of  the  target  word. 


p 


II  SOFTWARE 

The  following  sections  are  a description  and 
flowchart  of  the  software  pertinent  to  an  understanding 
of  the  Key  word  spotting  algorithm.  The  program  of 
interest  is  SPIN3M,  SPeech  INterpreter  3,  Multiple 
word  spotting. 

Functions  Of  The  Program 

The  functions  of  SPIN3M,  as  controlled  by  the 
key  board  and  console  are: 

1.  To  accept  specification  of  (up  to  2)  target  words, 
read  in  the  appropriate  reference  file  of  statistical 
data,  and  to  initialize  appropriate  tables  for  the 
search  of  words.  This  function  has  not  been  flowcharted, 
but  reference  is  made  to  its  routine,  called  "SETPTR" . 

2.  To  input  continuous  speech,  and  search  at  a real 
time  rate  for  occurences  of  either  of  the  two  target 
words.  While  doing  so,  information  concerning  the 
status  of  the  algorithm  is  to  be  output  to  the  key- 
board, and  special  identifying  symbols  are  to  be  out- 
put upon  successful  key  word  detection.  The  function 
is  performed  by  the  subroutine  SPIM4M,  whose  flowchart 
is  on  flowchart  pages  66  through  78. 
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detection,  a false  alarm,  or  at  any  other  time.  This 
is  done  as  part  of  SPIN4M,  with  conditional  exits  at 
appropriate  algorithm  points. 

4.  To  search  for  words  on  a non-real  time  basis,  in  the 
2.56  seconds  of  speech  data  stored  at  the  time  of  the 
real  time  search  halt.  The  subroutine  SPN4NR  performs 
this  function,  and  is  flowcharted  on  page  79. 

5.  To  calculate  on  a non-real  time  basis,  for  each  10 
ms.  interval  of  the  last  2.56  seconds,  the  likelihood 
that  a given  pattern  existed.  This  is  done  by  IPART2 , 
found  on  flowchart  page  80.  Flowchart  pages  81-91 
document  all  other  routines  necessary  for  functions 
2-5,  and  are  referenced  as  subroutines  on  the  first 

15  pages  of  the  flowchart. 
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LANGUAGE  AND  GROSS  STRUCTURE  OF  THE  PROGIUm 


SPIN3M  is  written  in  3 languages,  consequently 
it  may  be  said  that  there  are  3 levels  of  control  dur- 
ing its  execution. 

The  top  level  is  under  FORTRAN  control,  which 
executes  all  operations  for  which  time  is  no  consid- 
eration. This  includes  I/O  (except  PDP-11  to  Vector 
Processor  I/O) , and  the  keyboard  interactive  command 
interpreter.  After  accepting  commands  from  the  key- 
board, the  FORTRAN  code  calls  the  necessary  PAL  sub- 
routines. The  FORTRAN  routines  are  not  flowcharted. 

The  middle  level  of  control  is  PAL,  or  PDP-11 
assembly  language  code.  The  PAL  code  is  organized 
as  subroutines  which  are  called  to  execute  the  real 
or  non-real  time  word  spotting  functions.  PAL  is 
used  to  generate  most  of  the  pattern  concatenation 
(pattern  sequencing)  control  logic,  to  control 
vector  processor  operations,  and  generally  to 
direct  the  word  spotting  algorithm.  PAL  routines 
are  described  on  flowchart  pages  66-82. 

Bottom  level  of  control  is  within  the  vector 
processor,  as  instructed  by  Vector  Computer  Assembly 
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language,  or  VCASM  code.  The  PAL  subroutines  direct  the 
relinquishing  of  bus  mastership  from  the  PDP-11  to  a spe- 
cial high-speed  array  processor.  This  array  or  vector 
processor  performs  fast  calculations , facilitating  exe- 
cution of  preprocessing,  time  registration,  and  sound 
unit  (pattern)  similarity  computation.  During  the  exe- 
cution of  a vector  processor  routine,  the  vector  processor 
may  read  or  write  to  the  PDP-11  memory,  starting  at  the 
address  contained  in  the  vector  computer  bus  address  re- 
gister. Following  the  completion  of  a vector  processor 
routine,  the  vector  processor  halts  and  control  returns 
to  the  PDP-11,  resuming  the  execution  of  PAL  code.  The 
vector  processor  routines  are  flowcharted  as  subroutines 
on  flowchart  pages  83-91. 


PROGRAM  DATA  STRUCTUm^S 


All  PAL  and  VCASM  variables  are  integers,  with  a 
maximum  of  16  and  32  bit  resolution  respectively.  All 
PDP-11  arrays  are  composed  of  16  bit  integers.  The 
arrays  important  to  the  key  word  spotting  algorithm  may 
be  categorized  into  two  types:  buffers  for  input  data 
and  arrays  of  reference  and  status  data. 

Tlie  contents  of  the  input  data  buffers  may  change 
with  every  new  input  frame,  and  those  that  accumulate 
data  over  a number  of  frames  must  be  circularized  to  avoid 
an  attempt  to  store  data  beyond  their  upper  boundary. 

By  "circularized",  the  following  is  meant.  Each  time 
new  data  is  added  to  the  present  buffer  contents,  the 
buffer  pointers  are  advanced  to  the  destination  of  the 
next  datum,  until  they  point  past  the  end  of  the  buffer. 

At  this  time  the  destination  pointer  is  reset  to  the 
start  of  the  buffer  and  old  data  is  overwritten  by  the 
next  input  frame.  This  circular  data  storing  technique 
implies  that  real  time  input  data  is  retained  for  only  a 
brief  interval,  during  which  all  processii.g  and  decision 
making  must  be  done.  Based  on  considerations  of  space 
limitations  and  algorithm  performance,  all  input  data 
buffers  necessary  to  real  time  key  word  spotting  have 


I 
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been  circularized  to  a length  corresponding  to  2.56 
seconds  of  input  data,  or  256  input  frames. 


T* 


1 Every  10  milliseconds  a new  "frame"  is  generated 

I by  the  hardware  autocorrelator,  and  preprocessed  by  the 

vector  processor.  The  results  of  preprocessing  are  3 
data  elements:  a spectrum,  the  frame's  subjective  time, 
and  the  frame's  amplitude.  The  32  point  smoothed, 
equalized,  and  log- transformed  spectrum  is  calculated 
and  stored  in  the  frame  array  JIN,  as  32  consecutive 
16  bit  words.  Thus  JIN  is  a circular  buffer  of  spe- 
ctrum frames,  in  temporal  order,  with  one  frame  start- 
in'? every  64  bytes.  The  offset  to  the  destination 
of  the  next  spectrum  frame  is  contained  in  JINOFS. 

The  circularization  of  JIN  is  accomplished  by  making 
JINOFS  a modulo  16384  offset,  that  is,  modulo  256x64. 
The  frame's  subjective  time  is  a 16  bit  precision 
integer,  and  is  stored  in  2 bytes  of  the  array  JARC. 

The  offset  to  the  destination  of  the  next  frame's 
subjective  time  is  contained  in  JARCOF,  which  is 
made  modulo  512  = 256x2  to  circularize  JARC  to  a 
length  of  256  two  byte  subjective  times.  The  am- 
plitude of  the  frame  is  initially  output  by  the 
vector  processor  as  a 16  bit  precision  integer  in 
the  final  word  of  the  32  word  spectrum.  In  this 
manner  it  is  used  by  the  likelihood  routines  as  a 
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parameter  equivalent  in  importance  to  any  spectrum 
point.  When  real  time  analysis  is  halted,  all  the 
amplitudes  are  stored  in  one  buffer,  to  facilitate 
non- real  time  analysis.  This  buffer  is  called  JAMP , 
has  a length  of  512  bytes,  and  is  not  circularized 
since  it  has  no  real-time  word  spotting  application. 

Every  10  milliseconds,  after  preprocessing,  a 
new  pattern  is  designated  as  a combination  of  three 
previous  input  frames.  The  pattern  designated  is 
associated  with  the  frame  that  was  input  31  frames 
ago.  The  designation  of  the  pattern  associated 
with  a given  time  involves  the  specification  of 
pointers  to  the  three  frames  of  the  pattern.  These 
pointers  are  stored  in  the  3 picked  frame  buffers 
PRAMl,  FRAM2,  and  FRAM3.  Only  the  past  256  patterns 
are  valid  because  data  older  than  2.56  seconds  is  lost, 
thus  any  pointers  to  that  data  which  designate  a pattern, 
are  meaningless.  Since  the  pointers  are  16  bit  preci- 
sion integers,  the  construction  of  FRAMl,  FRAM2,  and 
FRA.M3  is  identical  to  that  of  JARC,  with  the  offset  to 
the  pointer's  destination  corresponding  to  a time  31 
frames  behind  the  current  frame  time  (see  flowchart 
page  68).  In  summary,  there  are  five  input  data  buf- 
fers used  in  real  time  key  word  spotting,  each  lapdatcd 
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and  circularized  to  a length  of  256  entries  every  10 


milliseconds.  See  the  table  at  the  end  of  this  section 
for  a summary  of  input  data  buffer  utilization. 

The  remaining  arrays  that  are  important  to  real 
time  key  word  spotting  may  be  categorized  as  contain- 
ing either  reference  or  status  information.  These 
arrays,  once  initialized  at  the  beginning  of  the  word 
spotting  run  remain  either  unchanged  or  are  only  slight- 
ly modified.  The  arrays  in  this  category  are  IFILT, 

IPSTAR,  IWDSYM,  IWHS,  and  IWDAN,  and  the  Word  Descriptor 
Arrays,  contained  in  the  module  WORDS. 

IFILT  is  the  cosine  transform  matrix  used  to  com- 
pute the  spectrum  of  the  input  data.  This  array  remains 
constant.  IPSTAR  is  a workspace  used  by  the  program 
when  calculating  the  prosodic  timing  characteristics  of  a 
spotted  word. 

When  key  word  spotting  initialization  is  executed, 
the  target  words  are  sot.  Associated  with  each  target 
is  a symbol,  a Word  Descriptor  Array,  and  mean  value  and 
standard  deviation  statistics.  For  the  Kth  target  word, 
the  Kth  element  of  IWDSYM  is  the  associated  symbol  and  the 
Kth  element  of  IWDAN  is  a pointer  to  the  associated  Word 
Descriptor  Array.  Thus  IWDSYM  is  an  array  of  target  worci 


symbols,  and  IWDAN  an  array  of  target  word  Word  Des- 


criptor Array  pointers.  The  mean  value  and  standard 
deviation  statistics  for  each  pattern  of  all  legitimate 
target  words  are  read  into  the  array  IWHS.  This  also 
is  done  at  initialization.  IWHS  remains  constant  until 
the  operator  chooses  to  introduce  a new  set  of  statistics. 
The  SETPTR  subroutine  assures  that  statistics  for  all 
specified  target  words  may  be  found  in  IWHS.  It  then 
sets  pointers  to  the  statistics  for  each  pattern  of 
every  target  word,  in  the  target  word's  Word  Descriptor 
Array  (WDA) . Once  this  is  done  a complete  definition 
of  each  target  word  may  be  found  in  its  V7DA,  and  all 
references  to  the  relevant  statistics  in  IWHS  are  made 
through  the  pointers  in  the  WDA. 


Basic  to  an  understanding  of  the  program  strategy 
is  an  understanding  of  the  structure  of  the  Word  Des- 
criptor Array.  After  initialization  this  array  contains 
a complete  description  of  the  target  word's  patterns  and 
timing,  and  all  the  necessary  information  concerning  the 
status  of  the  search  for  this  word  (e.g.,  how  many  pat- 
terns detected  so  far,  etc.).  The  use  of  the  WDA  allows 
the  searches  for  multiple  target  v;ords  to  be  independent 
and  asynchronous.  All  information  about  algorithm  status 
exterior  to  the  V/DAs  is  target  word  independent. 
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The  Word  Descriptor  Array  is  organized  into  three 
sections:  first  the  header,  primarily  containing  infor- 


mation on  the  history  and  status  of  the  search, then  a 
series  of  pattern  parameter  lists  yielding  an  exact 
description  of  all  the  patterns,  their  alternatives,  and 
the  interpattern  timing,  and  finally  two  arrays  used  for 
the  prosodic  timing  tests  which  follow  the  detection  of 
the  whole  word  pattern  sequence. 

The  WDA  header  is  presently  24  words  long,  but 
is  structured  for  easy  extensibility.  Associated  with 
each  target  is  an  "analysis  time",  which  indicates  how 
|:  much  of  the  data  presently  in  the  input  data  buffers 

has  been  searched.  Analysis  time  is  in  the  same  units 
. as  what  we  refer  to  as  "real  time",  that  is,  one  unit 

corresponds  to  one  new  frame,  or  10  milliseconds  by  the 
|i  clock.  For  every  new  input  frame,  the  current  real 

f 

time  is  incremented,  and  for  every  frame  in  the  buffers 
I of  past  data  which  is  processed  and  searched,  the  ana- 

lysis time  is  incremented.  Each  target  word  has  its 
! own  analysis  time  in  its  WDA,  thus  the  search  for  one 

target  word  may  be  50  frames  behind  current  real  time, 

\ while  the  searcl:  for  another  is  100  frames  behind. 

I 

I 

Analysis  time  is  of  course  never  aliead  of  current 
real  time,  but  is  also  never  allowed  to  fall  far 
behind  current  real  time,  because  that  would  imply 


analysis  of  lost  data.  The  target  word’s  header  con- 
tains the  analysis  time  associated  with  that  target 
word,  called  "T",  and  the  corresponding  "analysis" 
subjective  time  array  offset  "JARGON".  When  a pattern 
is  detected,  the  logic  updating  the  header  notes  this 
by  incrementing  the  count  of  patterns  detected,  saving 
the  real  time  of  the  pattern  likelihood  peak,  saving 
the  likelihood  at  that  peak,  and  saving  offsets  to  the 
subjective  time  and  frame  of  the  peak.  Various  timers 
are  also  set  to  force  timing  constraints  on  the  detec- 
tion of  the  next  pattern.  The  header  also  is  set  to 
point  a new  pattern  parameter  list,  in  section  2 of  the 
WDA,  designating  it  as  the  current  target  pattern. 

See  diagram  for  an  exact  description  of  the  WDA  header. 

The  W1')A  pattern  parameter  lists  represent  each  of 
the  alternative  patterns  comprising  the  word  pattern  se- 
quence. These  lists  are  linked  to  one  another,  in  that 
each  pattern  contains  a pointer  to  its  alternative  pat- 
tern parameter  list  if  one  exists,  and  a pointer  to  its 
succeeding  pattern  parameter  list  if  it  is  not  the  final 
pattern  in  the  word  pattern  sequence.  The  pattern  para- 
meter list  also  contains  pointers  to  the  statistics  for 
that  pattern,  and  real  time  timing  constraints  for  the 
detection  of  the  succeeding  pattern. 
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Following  the  pattern  parameter  lists  are  2 


arrays  used  for  prosodic  timing  parameter  calculation 
and  testing.  The  first  is  an  array  of  the  maximum  and 
minimum  allowable  values  for  each  of  the  2n-l  prosodic 
timing  parameters  in  an  n pattern  word.  The  second 
array  is  an  n word  buffer  meant  to  contain  the  pattern 
likelihood  peak  time  for  each  pattern  in  the  word  pat- 
tern sequence.  It  is  filled  during  the  word  spotting 
routine  run.  These  peak  times  are  used  as  raw  data 
for  the  prosodic  timing  parameter  calculation  and  testing 
routine  described  on  flowchart  pages  76  and  77. 

A detailed  depiction  of  the  Word  Descriptor  Array 
contents  and  their  ordering  follows: 
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SPIN4M  (MULTIPLE  WORD  SEEKING  ALGORITHM) 

WORD  DESCRIPTOR  ARRAY  K (FOR  WORD  W/  SERIAL  #K) 
SET  WDAPTR  = IWDAN(K)  "HEADER"  INFO: 


IWDAN (K) 

2 (WDAPTR) 
4 (WDAPTR) 
6 (WDAPTR) 
10 (WDAPTR) 

12 (WDAPTR) 

14 (WDAPTR) 

16 (WDAPTR) 
20 (WDAPTR) 

22 (WDAPTR) 

24 (WDAPTR) 
26 (WDAPTR) 

30 (WDAPTR) 

32 (WDAPTR) 

34 (WDAPTR) 

36 (WDAPTR) 
40 (WDAPTR) 

42 (WDAPTR) 
44 (WDAPTR) 


PTR  TO  1ST  PATTERN  PARAMETER  LIST 
/WORD  SPELLED  OUT 
IN  UP  TO  SIX 
ASCII  CHARACTERS/ 

= CURPAT  (WDAPTR)  ADDRESS  OF  PATTERN  PARA- 
METER LIST  FOR  CURRENT  PATTERN  SOUGHT 

= SUMP AT  (WDAPTR)  CUMULATIVE  # OF  PATTERNS 
DETECTED  SO  FAR  (FOR  THIS  WORD) 

= T1  (WDAPTR)  = TP  + WINDOW  = EXPIRATION 
TIME  OF  CURRENT  PATTERN  SEARCH 

UNUSED 

= WDPCNT  (WDAPTR)  # OF  PATTERNS  COMPRISING 
THE  WORD 

= TIMER  (WDAPTR)  = TP,  + WDLMIN  = EARLIEST 
ACCEPTABLE  TIME  FOR  TOTAL  WORD  END 

= WDSTRT  (WDAPTR)  FLAG  SET  IF  WORD  STARTED 

= PASTRTT  (WDAPTR)  FLAG  SET  IF  PATTERN 
STARTED 

= T0  (WDAPTR)  = (T  OF  1ST  THRESH  CROSSING) 
+ #TRKTIM  = EXPIRATION  TIME  OF  PEAK 
TRACKING  FOR  THIS  PATTERN 

= TP  (WDAPTR)  TIME  OF  LAST  LIKELIHOOD  PEAK 
FOR  CURRENT  PATTERN 

= MAXL  (WDAPTR)  LIKELIHOOD  VALUE  OF  LAST 
LIKELIHOOD  PEAK  FOR  CURRENT  PATTERN 

= UNUSED 

= PTMAR  (WDAPTR)  POINTER  TO  PROSODIC 
TIMING  MAXIMUM  AND  MINIMUM  ARRAY 

= IPTRT  (WDAPTR)  POINTER  TO  BE  STEPPED 
'"HROUGH  PEAK  TIME  ARRAY,  INITIALLY 
= IPATIM 

= IPATIM  (WDAPTR)  POINTER  TO  PEAK  TIME 
ARRAY  FOR  THIS  WORD  (FOLLOWS  PTMAR) 
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46 (WDAPTR) 
50 (WDAPTR) 

52 (WDAPTR) 
54 (WDAPTR) 

56 (WDAPTR) 


= T (WDAPTR)  ANALYSIS  TIME  FOR  THIS  WORD 

= JARGON  (WDAPTR)  POINTER  TO  CORRESPONDING 
"ANALYSIS"  S^  BJECTIVE  TIME  IN  JARC 

= UNUSED 

= JARGON  (WDAPTR)  POINTER  TO  SUBJECTIVE 
TIME  OF  LAST  LIKELIHOOD  PEAK 

= WDLIM  (WDAPTR)  MINIMUM  WORD  DURATION 


WORD  DESCRIPTOR  ARRAY:  PATTERN  PARAMETER  LISTS 
(FOLLOWS  "HEADER"  INFO) 


PATl : 
PATl+2 : 

" +4  : 

" +6: 


+ 10: 


+ 12  : 


+ 14  : 


+ 16: 


PAT  2: 


THRESH  (PATl) 
WINDOW  ( " ) 


ALTPAT  ( " ) 

NXTPAT  ( " ) 

MEANS  ( " ) 

STDS  ( " ) 

(SAME  AS  ABOVE) 


# OF  THIS  PATTERN 

LIKELIHOOD  THRESHOLD  SETTING 
FOR  THIS  PATTERN 

MAXIMUM  DURATION  OF  SEARCH 
FOR  NEXT  PATTERN  ( = 0 FOR 
LAST  PATTERN) 

# OF  FRAMES  AFTER  PEAK  OF 
THIS  PATTERN  DURING  WHICH 
NEXT  PATTERN  MAY  NOT  BE 
FOUND 

ADDRESS  OF  ALTERNATE  PATTERN 
PARAMETER  LIST 

ADDRESS  OF  NEXT  PATTERN  PARA- 
METER LIST 

POINTER  TO  MEAN  STATISTICS 
FOR  THIS  PATTERN 

POINTER  TO  STANDARD  DEVIA- 
TION STATISTICS  FOR  THIS 
PATTEm 


PATN: 


N 

NUMBER  OF  THIS  PATTERN 

= THRESH 

(PATN) 

= WDTIME 

( 

II 

) 

0 (THIS  IS  THE  LAST  PAT- 
TERN IN  WORD  PATTERN  SE- 
QUENCE) 

0 

= ALTPAT 

( 

II 

) 

0 

= NXTPAT 

( 

II 

) 

0 

= MEANS 

( 

It 

) 

POINTER  TO  MEANS  FOR  THIS 
PATTERN 

= STDS 
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POINTER  TO  STANDARD  DEVIA- 
TIONS STATISTICS  FOR 
THIS  PATTERN 


WORD  DESCRIPTOR  ARRAY;  PROSODIC  TIMING  MAX  AND  MIN  ARRAY 


i 

(FOLLOWS  PATTERN  PARAMETER  LISTS) 

ASSUME  THIS  IS  THE  WDA  FOR  WORDS  WITH  SERIAL  # =N , 


WDPCNT 


1 

I 


K 


WORD  DESCRIPTOR  ARRAY:  PATTERN  LIKELIHOOD  EAK  TIME  ARRAY 

(FOLLOWS  ABOVE  PROSODIC  TIMING  CONSTRAINTS) 


PKARN:  TIME  OF  PEAK  1 


TIMS  OF  PEAK  K 


END  OF  WORD  DESCRIPTOR 


FLOWCHART  VARIABLE  NAMING  CONVENTIONS 


Due  in  part  to  the  fact  that  SPIN3M  is  written  in 
3 languages,  the  flowchart  conventions  for  variable 
naming  require  clarification.  Consider  the  name  "X". 

If  X is  referenced  alone,  the  variable  named  has  the 
value  of  the  word  at  address  X.  #X  is  the  variable 
whose  value  is  the  address  X.  Thus  #JARC  is  the  ad- 
dress of  JARC.  @X  refers  to  the  variable  whose  address 
is  contained  at  address  X.  This  is  "indirect"  address- 
ing, and  according  to  PAL  conventions  X must  be  a re- 
gister. X(Rn)  is  the  variable  whose  address  is  the 
address  X plus  the  contents  of  Rn  where  Rn  is  register 
n.  All  of  the  preceeding  are  essentially  PAL  conventions. 
X subscripted  by  an  italic  character  "t"  implies  that  X 
is  the  name  of  an  array,  and  that  the  variable  referenced 
is  the  ith  element  of  the  array.  This  is  a FORTRAN 
convention.  If  X is  the  character  "A"  or  "B",  then  An 
or  Bn  where  n is  a positive  integer  (less  than  256)  refers 
to  the  nth  word  of  the  vector  computer  A or  B memory,  res- 
pectively. A Greek  subscript  to  X (usually  upper  or  low- 
er case  sigma  in  the  flowcharts)  implies  that  the  variable 
referenced  is  an  element  of  a Word  Descriptor  Array  or  VTOA 
(see  dictionary  of  terms) . An  upper  case  sigma  (1)  indi- 
cates that  the  variable  referenced  is  part  of  the  header 
information  of  the  Zth  WIjA. 
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A lower  case  sigma  (a)  indicates  that  the  variable 
referenced  is  an  element  of  the  oth  pattern  parameter 


list  of  the  Zth  Word  Descriptor  Array.  For  clarifi- 
cation of  WDA  structure  see  data  structures  sections 
on  page  58  . When  it  is  used  in  this  context  X names 
the  element  of  the  array  referenced  and  I or  o denotes 
the  array.  Thus  the  address  of  the  variable  CURPAT^ 

would  be  the  starting  address  of  the  fth  WDA  plus  an 
offset  equal  to  the  predefined  value  of  CURPAT.  The 
address  of  NXTPAT^  would  be  the  starting  address  of  the 
oth  pattern  parameter  list  in  the  Zth  WDA  plus  an  offset 
equal  to  the  value  of  NXTPAT.  In  short,  all  Greek  sub- 
scripted names  are  indices  to  some  part  of  a Word  Des- 
criptor Array,  determining  which  element  is  the  varia- 
ble referenced.  For  a summary  of  variable  naming 
conventions  see  chart  below. 


Summary  of  Flowchart  Variable  Naming  Conventions 


Name 


Variable  Named 


X 

#X 

\(Rn) 

X . 

An  or  Bn 


‘ 0 


Variable  v;hose  address  is  X 
Variable  whose  value  is  address  X 
Variable  whose  address  is  at  address  X 
Variable  whoso  address  is  X plus  contents 
of  register  n 
ith  element  of  array  X 

nth  word  of  V.C.  A or  B memory 
Variable  found  X bytes  after  start  of 
Zth  Word  Descriptor  Array 
Variable  found  X bytes  after  start  of 
oth  pattern  parameter  list  in  Zth  WDA 
(X  has  been  assigned  a numerical  value 
in  above  2 cases) 
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SUMMARY  OF  BUFFER  UTILIZATION 


Buffer 

JIN 

JARC 

Jamp 

FRAMl 

FRAM2 

FRAM3 

IFILT 

IPSTAR 

IWDSYM 

IW?IS 

IWDAN 

WDA 


Use 

Stores  32  word  spectrum  frames  (circular) 

Stores  1 word  subjective  times  for  each 
JIN  frame  (circular) 

Filled  with  amplitude  of  each  frame  in 
JIN 

Pointers  to  first  picked  frame  of  each 
pattern 

Pointers  to  2nd  picked  frame  of  each 
pattern 

Pointers  to  3rd  picked  frame  of  each 
pattern 

Cosine  transform  matrix 

Prosodic  timing  test  workspace 

Array  of  symbols  associated  with  each 
target  word 

Statistics  for  all  legitimate  target  words 

Array  of  pointers  to  target  word  WDAs 

A Word  Descriptor  Array,  reference  and 
status  information  unique  to  one  target 
word 
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FLOWCHART  TABLE  OF  CONTENTS 


Routine 

Entries 

Page 

SPIN4M 

66 

PICK 

68 

NEXTFR, 

GFRAMS,  LIKFUN 

69 

WRDDET 

70 

TRKUP 

71 

T0OUT 

72 

BACKUP 

73 

GIVEUP, 

REINIT 

74 

RETURN 

75 

WHLWRD 

76 

PTINIT 

77 

UNCIRC 

78 

SPN4NR, 

SPN4N1,  SPN4N2 

79 

IPART2, 

PART 2 E 

80 

START 

81 

INTWDA 

82 

IN32A7, 

IN32C7 

83 

VDMAV6 

84 

IN32A6 

85 

V7GO 

86 

IWT2 

87 

VJMM7 

88 

VCLK7,VCLK27 

89 

VCOUT7 

91 
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F ijF'  0 


pi  urn  TO  NE.'  .'  f 
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|LIIUDES  OF 


RNP- 
256 

0 iFRHNES  SRVED 


onSRVE  F'ETURN 
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CHLL.RPLE  hFTFP  Gu  COMHHND.  WITH  TIN  HND  .THRC  CONTRINING  SPECTPRL 
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HPFR't'S 


#11  N-:  I INETP 
I 0-'  1 1 NijF  s 
j 0-  TT 
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NUMBER’S  FROM  Pt'P 
TO  R MEMORY.  5TRP 
irji,  RT  PPE2ENT 
HrT'PES?  IN  R RN[' 
INrpEMENTtNG  N.  ’ 
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RRE  STORED  IN  R96-255.  RND  RN 
RrCUMULRTOR  FRRME.  WHOSE  RUTOCR- 
RELRTION  COEFF  FOR  DELRV  N IS  THE 
SUM  OF  THE  COEFF  FOR  THE  SRME 
DELRV  IN  THE  5 MOST  RECENT  FRRMES. 
IF  STORED  IN  B224-255 
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RCCUMULRTOP  FRRME  IS  NOW  THE 
IsMOOT  HFI'  C ORPELRT  I ON  ■ SI-100THEt' 
iIN  TIME  BV  S POINT  MOVING  RVERRGE 


'FPU  I'ELRV  short  TERM  RUTOCORR 
OEFF  IS  TRf EN  RS  StoNRL  POWER 


C IS  R LONSTRNT  L HO'SEN  TO  FILL 
PhNGF  of  FOURIF.P  TPHNSFORM 
RPITHMETU  UNIT  WTIH 
iNi  iPMRl  ILFD  CoEFFU  I ENTS 


ill  h’t ‘K 


I 


.J 


'■  U'-tHP"  IS  IJSURLLV  THh  HMPf.nUt-E 
HPPHV.  pin  Is  U'-i-r-  HbPF 
ITM-1PUPHPV  -Ti'PHtib 


FM’  . hmf  1-  --.HriF! 

H''  I HF  ' ..I  M IhF'F 
2-1  n‘1 1 I If"  Tt-  F’l  I 
DI  1 mV  ri  ITi  m OPF 
I iiF>>-  D IFNI 


V-’i  P-M-  ^ 

■rI  Foi  Rtf’  PM-  RMP- 
I I O II  'F  HNl  ■ 1 II ITPI  1 1 
I I ■ pup  - I I Wi  ipl  ■ j 
'Ur  iP  ' HNL  IO  B.  ut^J 


,'F'F  lOpN 


ruNTPi'MS| 
'■UP- It  ,, 


8 4 


L 


FirJ.ri  V P POUT  I NFS 


sp I n: m 


P ROLITIfJES 


^FTT'  STEP  OF  CHLriJLHTION  OF  EQURL- 
I ZhT  I ON  COEFF 1 1;  I ENTS 


THE  OBJECT  IS  TO  FIND  H SCRLE 
FHCTOR.  R..  SUCH  THhT 

R+T'IhMC  SMOOTHED  PERK  RPRflV> 

=MRX'::  UNSMOOTHED  PERK  RRRRV  ■ 
THIS  MRXIMUM  RRTION  IS  CLERRLV  THE 
MRXIMUM  RRTIO  OF  RNV  POINT  IN  THE 
CURRENT  RMRLITUDE  RESTORED  HRMMED 
SPECTRUM  FRRME  TO  THE  CORRESPONDING 
SMOOTHED  PP  RRRRV  POINT  THUS  IF  WE 
SCRLE  THE  SMOOTHEC'  F'ERP,  RFiRRV  E"t'  R 
MULT  I PL  1 CRT  I VE  FfiCTOP  OF  R.  'C,.  RLL 
VRLUES  OF  THE  EOURLIZED  FRRME,.  IE. 
HRMMET'  SPECTRUM  DIVIDED  BV  SCRLED 
SMOOTHED  PK  RRRRV..  WILL  LIE  BETWEEN 
0 RND  C THIS  RLLOWS  UTILIZRTION  OF 
FULL  RRNGE  OF  THE  RRITHMETIC  UNIT 

JFd  RND  FINRL  STEP  OF  CRLCUL RTON 
OF  EOURL  I ZRT I CfN  COEFF  I C I ENTS 

MULTIPLV  SMOOTHED  PERK  RRRRV  BV 
RHTiU  R.'C  TO  VIELD  Z2 
EOURLIZRTION  COEFFir lENTS 


EQURLIZE  r.UPPENT  RMPLITUDE  RESTORED 
HRMMED  SPECTRUM  FRhME  TO  VIELD 
"EOURLIZED  FRRME  " ERJ H POINI  oF 
RMP  RESTOREC'  HRMMEI'  SPECTRUM  FRRME 
IS  divided  BV  CORRESPONL'ING  COEFF 
IN  EOURLIZhTION  hPPRV 
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FINL'  THE  RMFLITODE  OF  THE  EOURLIZED 
Sl  GrjRL.  RND  PEPLRCE  SzND  SPEr  fPUM 
■ 4 t H:-.’ ' WITH  THIS  VRLUE 


ROUTINE" 


I NT  ' i 

F- UET'1  HRF!H’t' 

OO;-’  OF  UElGHfE-'D  UIS- 
TRNlES  HL.ONO  1ST  J-f:. 

SPEi;  trhl  h:  ;eg  in 

MUl.T  [['INENSIONmL 

SPhCF  of  fpeouencv 
C.Ot1PONFN  rs 
FOP  t=0  TO  15. 

C Fl ' 0+T  E*'  10%^' ‘ I 

+ ll'O  ->&■  1 ■> 


1ST  15  POINTS  OF  PREVIOUS  E0URLI2EC' 

FpHME  hpe  Shved  in  bih2-20?,  lhtest 

EOIJHL.ISED  FPHME  STORE!'  HT  HO-Sl 
WEIGHTING  CONSTHNTS  STORED  IN  PDF' 

. HPPHV  I HI  ONLV  1ST  16  SPECTRUM 
VO  1 NTS  HPE  USE!'  TO  CHLCULHTE  HPC 
iT-NGTH  INTEPFPHME  DISTHNCE  OP 
LLJ.IB  TECTIVE  1 IMF"' 


TMTl  ^ 

riL'-UrlULH  I E SUN 
I 'F'  1 6 HE.  i GH 1 El  ' H.  1 1 HL 
I'l-nFlNCES  HND  STORE 
IN  B.'S) 

ip 

HTOB".  V 

'-•HVE  I'ST  16  POlNT'o 
iiF  NEW  EOUhLTSED 
-PF'  TPUN  Hi  HO-Sl  IN 
SPHr  F F OF'  F'F'E' V 1 1 lU'S 
E"UHL  i :;ED  SF'EC- 
IPUM  H'l  B1-  S-S07'. 


Dr’'l!'E  FOIjhI  ISE!' 
PfFiME  ^IVF.FhGE  VFiLUE 
JM  H.O  BV  S'  TO  GIVE 
HMPl  r'Ur.'F  IN  E'O 


L OF 

~ V 

'iin  V 

IWR 1 TE 

01  IT  I 

-IFF  1 ENC 

1H 

INi'FFMEN! 

. ■■-■i.ib  TFf 

_ 

T IVf 

I I ME 

' IN  P 

L 

To 

FTip  Wi 

'Pfi  "hFi 

*• 

1 , 1 11  1 T F'l-iff-  r I 

F'HFn  1 ' If. 
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Fl  1 1 Fl 

PE--.fFii  To 

HVF  PhIjF 

Oh.  pt-  M ,1 

r i,Fi[---- 

! >4  HO-  . 1 

FT  Ip  i - 

F • ~ i. 

H.  -Fli  1 1 -t  1 

-•  H , 

t 

■'  -FBO  . 

t 

. ■ i 

MFIPIC  FOP  16  DIMENSIONHL  SPECTRUM 
SPHTE  IS  WEIGHTED  "IhIIICHB  METRIC" 
L'l  '■  >![  . X,  ■ ■'  ■ ' Vi  Vj  ' V > 3= 

!'-  il  I'JHEPe  the  c^ 

ARE  WEIGHTS  '-.UM  OP  WEIGHTE!'  HXIHL 
DISPLHCEMEN I S BE i WEEN  COoRDINHTEs  0 
TWri  FF'HMES  IS  CHULEI'  "SUBIElTIVE 
MME"  El. HP'- ED  BETWEEN  THE  PRUNES 


t-'I- PFi'iF'M  I'll  Ih'S'I -Lljl.i  TF'HN'r.Fl  IPI'IHI  Tl  IN  Of4 
r I I TI'  F F'Hf'IF  Ut  F'F'Ill 'Ui.  E I 


' ) iF'E -.HOL  D HN; 
I- OP  '■•ppr  fpHl 
' )F'F.^  1 1 1 1 F F'l  iM 

r ' *F  F'4  41  iE  '"'Hi .'  if 

"■F'T  ' IF''"'  O ^ ' I 
i HF'ljF  hNI  ' -■ 


D HfJ'. ' -'.'-il  UPH  ! U iN  EKFEf  T 
r PHI  1 N I ENS  ] 1 1 F ■ . I .p  1 H 1 1 r JG 
Fpiir-i  hvff'hite  e."  c onihin^. 
'■•'Hl.iiF  iTF  FF'IIHLIXFI'  f PHME 
*0  F"F  H IfJF  INI  lEI  V 
Nf ' P'  'F  I 1 H 


F : I n f I I"  I iN  1 F'l  II 

III  F 'I  ■ I ' > 
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;pIti?:M  V 


P ROUTINES 


LOG  TPRNSFOPMED  FRfiME  IS  SCRLEC 
TO  RHNGE  E.'E  TWEEN  + RND  - S'04? 
v.FOR  PURPOSES  OF  LIKELIHOOD  CRLC 
'ULRTON  TIN  IS  RRRRV  OF  LRST  5 
L 0 G T F!  R S F 0 R M E D F F*  R M E S . 


RETURN  CONTROL 
TO  PDP-11 


■ irj  H P PnUl  IfJP". 


Ct„r^  i F f-  K'l  IM  ^ \ "'C  L- 


1 OMpnrt  iHP  rOP- 

PPP.'hrJrFb  PfTWFPN 
i ‘ I PRTTbPrj  VHi 
Ih  RNL' 

PPKP-cl-  r(CF  PhTTEPN 
MF-'P-irj  ■•.•hLIIE’-  |j.  IfJ 

p(*p  -i.i  Sl'PT  I n:s 

pi  iFPEP 
POP  i'-ii--:  '0 


H,-  -Mv 


pi  ■ 


_i  I 


(! 


i|  IPN  I' If  4 TP' 

III  pr'F-1'! 


D 


F7  ErntPEF'  WITH  V P BUS  PlDL'PB'PiS 
PF;ul'i..TEP  POINT  [fJO  TT.i  PEFEPENPE 
iPRTTP.PN  HERN  'v'Rl  IIEP 

'THERF  ETEP  ' . RPP,  To  CRL  l L I.  f EL  I PIOi' 'I  ■ 
'THRT  THE  THREE  PRRNE'?  IN  R HEHUPV 
RPE  z.Rl'IE  PRT'IERN  R'P  ONE  iJJRRENTLV 
'l-.RUOHI  Pop  R 'EiTIJFjHr  PRIIEPN  WHOSE 
f''F'.P  t' P E NF  E P'R  I HERN  VRI...LIES  RP'E  ■ 
ji=0-.:R5.  RNC'  STPiNIomPF'  ['EV  VRLUES 
.PIPE-  ai  i=0-:y5.  LIP  ELI.  HOOP  P THRT 
l-HnpPN  PllTH  VRl.i.lES  tS{  IS  SRME  RS 
.PEPEPEMF  E PRT  lEhN  IS 

-.2  V 


G I 


EN  B'- 


P=--T 


r i/  u 

.1  !' 

=0 


+L. 


IWHEPE  F P-.  R SCRl.  E PRi.'TOp  RNP  L 
|IR  R 1 OH  OPT- SET  TEPH  P IS  I NVERSEL V 
.PElRTEt'  TO  CLOSE NE'-S  i.iP  FIT  OF  TEST 
P R T T E P'  N 1 r,i  P P’  I E P E IT  i;  E F'  R 1 1 E P'  N . 


I t 


pi  il. 
.IN 


CENTER  PpohA 
PL'P-  I I J' 


j 

“■"P'l..  'r'  lo  fTT.Tt-  riF  e'"1 
Bo  -R,  P’t'  invepses' 


I.IF  STRMF'RPI.  [.EV- 
IRTITiN'S  HfTC'  SOORPE 
Ri  T I IHI.ILH  ' E '.SF'I  IRu-'i.  ' 


ir.i 

rpp 
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F-T-  - 


"F  " 
Tp  i 

p. ' 


Of  I 
P’- 

- -F 


, i I ... 


oo 


vri.t  w:  F.NTEPFE  W.  ’ V P BI..I'?  RL'F'PFSS 
►•FolpTEP  PO.TNTiNG  To  HOLT  I P'L  1 1 RT  I 'v'E 
i NVEPSF  S OF  P'EFE  PENCE  PRTIEPN 
I HNl  'RPT'  F'EVIRT  ) ON'S 


HI  ''  ' I.  iTlT  I >1-  O -“P"  1 

■ ; ..'J'-.  ' ..•!  T'l  ■ . .L 

I , 0-1 1 I r I . s I ' ■ 

►■T  I I ipn.il  t'l  I F ' • I!  IR;'  ‘ . 

IN  po 
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IN  .Ml  1-1  O p.opi  I p.'.  p.i|  L'liMN'i 

HNl 'HPI  Im  iHTlofa-  ['■.  H l.oij  TiPP'-.P  I 
. I- 1-'  'n-.NI-  •■>'!  t I ' Ei't’  I of  '-I.iH  op  THE 
I O' Ml-  ImNI'HI-'L'  I 'L  ■■■' I pi  I 1 of4S'  Pi.lP' 
PiiOj!'.  In  IH'F.  Fiji  N I FR  I 1 EPN 
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I O - F 
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I -llt-N  I I Ih  I F.  . 

T . I ' 
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80/90 


■PIN  M V P ROUTINE? 


VCCiyiZ 

/Winter  ppot'^ 
V Pi'p-ii  j 


NPITE  "L  n ELIHOOC' 

IN  F TO  PDP-11 
NOPD  "LHMET'H  " 


PE TURN  CONTROL 
. lu  PL'P-11  . 


VCOUT7’  ENTERED  W,'  V P BUS  FiDDPES? 
REO  POINTING  TO  PDF'  "L  I F EL  I FIOOD" 
DEi-T  LRriE'C'H  LMNBI'FI  IS  RCTURLLV 
R CON'STRNT  PLUS  R TERM  RPEROX I HRTELV 
INVFRl.ELV  PROPOR'I  lONRL  TO  THE  Llt.E- 
- LI  HOOD  THRT  THE  PRTTERN  TESTED  IS 
SRHE  RS  REFERENCE  PRTTERN  GRERIER 
LRMBDR  IMPLIES  LESSER  SI  M I LRR I TV.. 
THUS  THE  PRTIEPN  WITH  LERST  LRMBDR 
WCiiJLD  BE  C HOSEN  RS  MOST  SI  MI  LRR 
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» 

R 

R0 

RDDR 

RLTPHT 

RHP 

RN 

RRC 

flUTOCOR 

B 

B0 

BN 

COEFF 

COMP 

CURPRT 

DUMMY 

F LRTCH 

FRRMl 

FRnri2 

FRRM3 

FRRME 

-TN 

1302 

1303 
IDT 
IFILT 
INIT 
INTUDR 
IPmPT2 
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DICTIONRPV  OF  SYMBOLS  RND  VHRIRBLES  FOUND  IN  FLOWCHRRT 


NUMBER,  I ND I COTES  "ODDRESS  OF"  WHEN  PRECEEDIHG  O VRRIOBLE 

R MEMORY,  Of4E  OF  TWO  256  WORD  X 16  BIT  DOTH  MEMORIES 
IN  THE  vector  computer 

1ST  WORD  OF  VECTOR  COMPUTER  O MEMORY.  j 

ODDRESS  <OBBREVIOTION>  j 

I 

POTTERN  POROMETER  LIST  ELEMENT:  PTR  TO  CURRENT  PRT.  RLTERNRTIVE 
DESTINOTION  OF  RMS  OMPLITUDE  COLCULOTED  BY  V.  P.  . 

<N-1>TH  WORD  OF  V.  C.  R MEMORY. 

REPOSITORY  IN  PDP-11  FOP  SUBJECTIVE  TIME  OF  CURRENT  INPUT 
FRRME  OS  COLCULOTED  BY  THE  V.  P. 

OUTOCORRELOTOR  (OBBREVIOTION). 

B MEMORY,  ONE  OF  TWO  256  WORD  X 16  BIT  DRTR  MEMORIES 
IN  THE  VECTOR  COMPUTER. 

1ST  WORD  OF  VECTOR  COMPUTER  B MEMORY. 

<N-1>TH  WORD  OF  VECTOR  COMPUTER  B MEMORY 

COEFFICIENT  ^OBBREVIflTION> 

COMPUTE  <OBBBREVIOTION> 

WDR  ELEMENT  POINTER  TO  POTTERN  POROMETER  LIST  FOR  CURRENT 
POTTERN  SOUGHT. 

ORROY  USED  FOR  PflSSOGE  OF  VORIOBLES  TO  FORTRAN.  OT  TERMINATION 
OF  REAL  TIME  SEORCH  IT  CONTAINS  OFFSET  TO  OLDEST  JARC  TIME. 
VECTOR  PROCESSOR  STORAGE  REGISTER. 

BUFFER  OF  POINTERS  TO  1ST  FRAME  PICKED  FOR  PATTERN  CORRESPONDING 
TO  EACH  FRAME  IN  JIN  SAME  CONSTRUCTION  AS  ja^t;. 

BUFFER  OF  POINTERS  TO  2ND  FRAME  PICKED  FOR  PATTERN  CORRESPONDING 
TO  EACH  FRAME  IN  JIN  SAME  CONSTRUCTION  OS  Jf»C. 

BUFFER  OF  POINTERS  TO  3RD  FRAME  PICKED  FOR  PATTERN  CORRESPONDING 
TO  EACH  FRAME  IN  JIN  SAME  CONSTRUCTION  AS  JAR'C. 

BASIC  UNIT  OF  INPUT  DATA  ORIGINALLY  32  AUTOCORRELATION 
COEFFICIENTS.  SUBSEOUENTLV  PREPPOC ESSED  INTO  A 32  POINT 
SPECTRUM  ONE  FRAME  IS  A UNIT  OF  TIME  EOUAL  TO 
10  MILLISECONC-S  BV  THE  CLOCK. 

FORTRAN  •:  ABBRE  V I AT  I ON  > 

INFORMAT  I OtJAL  MESSAGE  PRINTED  Of  4 KB  AFTER  TEFfllNATION 
OF  REAL  TINE  SEARCH.  ACC0MPAf4IED  BV  VALUE  OF  TC-T. 

INFORMATIONAL  MESSAGE  PRINTED  ON  KB  IF  ANALVSIS  TIME  FALLS 
TOO  FAR  BEHINI-  PEAL  TINE,  ACCOMPANIED  BV  TC-T. 

REAL  TIME  SEPARATI0t4  BETWEEN  PICKED  FRAMES  OF  PATTERN, 

GIVEN  IN  Uf4ITS  OF  lO  NS. 

APRAV  OF  C0SIf4E  TPAt4SF0RM  COEFFICIENTS  USED 
TO  CALCULATE  SKECTRUM  OF  INPUT  FRAME 

If4ITIAl.  I3AT  ION  FLAG  SET  BV  FORTRAN  BEFORE  CALLING  IPART2, 

ANC-  CLEAPEC'  BV  IPAPT2  AFTER  INITIALIZATION 
SliBP0UTIf4E  TO  INITIALIZE  THE  WORD  DESCRIPTOR  ARRAY 
CURF-Ef4TLV  PriIf4IED  TO  BV  Wl'APTP 

NON-REAL  TIME  LUELIHOOI'  CALCULATION  SUBROUTINE  CALCS  LIKE 
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THftT  F-HTTEfiN  FiSSOC  WITH  EhCH  FRHME  IS  SOUGHT  FflTTEPN 
IPfiTIM  WC»R  ELEMENT  POINTER  TO  STHRT  OF  PEFiK  TIME  HRRfiV. 

IPSTHR  WOPLSPfiCE  FOR  ChLCULHTION  OF  DETECTED  WORD'S  PROSODIC  TIMING 
PmPmMETEPS 

IPTPTR  WDm  element  PTP  TO  DESTINFiTION  OF  NEXT  PERK  TIME  IN  PERK 
TIME  RFRRV  riNITIRLLV  = IPRTIM' 

ISNRM  BUFFER  USED  B.V  NON-RERL  TIME  ROUTINES  TO  STORE  EITHER  LIKE  OF 
ERCH  FPRME  • IPRPT2>,  OR  # OF  RRTS  FOUND  RT  ERCH  FRRME  <SRN4NR.v 
IWDRN  RRRRV  OF  POINTERS  TO  TRRGET  WORD  WORD  DESCRIPTOR  RRRRVS 

IWDSVM  RRPRV  OF  SYMBOLS  RSSOCIRTED  WITH  ERCH  TRRGET  WORD/  IN  ORDER 
OF  TRPGET  WORD  SPEC  I F I CRT  ION 
IWT  RRRRV  OF  SUBJECTIVE  TIME  WEIGHTS,  SUBJ  TIME  IS 

BRSED  OF  THE  SUM  OF  WEIGHTED  SPECTRRL  CHRNGES 
JRMP  RRPRV  OF  RMPLITUC'ES  OF  ERCH  FRRME  IN  JIN.  FILLED  ONLY  AFTER 
TERMINATION  OF  PERL  TIME  SEARCH. 

JRRC  256  WORD  RRRRV  OF  16  BIT  FRRME  SUBJECTIVE  TItES  <CIRCULRR>. 
JRRCEN  LAST  WORD  OR  END  OF  JRRC. 

JHRCOF  OFFSET  TO  JRRC  INDICATING  DESTINATION  OF  NEJfT  INPUT 
FRAME'S  SUBJECTIVE  TIME 

JRRCON  WDR  ELEMENT:  CONTAINS  BYTE  OFFSET  IN  JRRC  TO  LOCATION  OF 

SUBJECTIVE  TIME  CORRESPONDING  TO  THIS  WORD'S  RNRLVSIS  TIME. 
JHRCOP  WDR  ELEMENT:  JRRC  OFFSET  TO  SUBJ.  TIME  CORRESPONDING  TO 
PERK  LIKELIHOOD 

JIN  8K  WORDS  RRPRV  OF  256  32  WORD  SPECTRUM  FRAMES  <CIRCULRR> 

JINOFS  BYTE  OFFSET  TO  JIN  GIVING  DESTINATION  OF  NEXT  32  POINT 
SPECTRUM  FPRME. 

JINPTR  POINTER  TO  DESTINATION  OF  NEXT  INPUT  SPECTRUM  FRAME  IN  JIN. 

JINPTR  = #JIN  + JINOFS 
KB  KEYBOARD  CRBEPEVIRTION) 

LAMBDA  PRTTEPN  SIMILARITY  MEASURE  < APPROX.  INVERSELY  PROPORTIONAL 

TO  LIKE  THAT  TEST  PATTERN  IS  SAME  AS  T.HE  REFERENCE  PATTERN). 
LC  ALPHABETIC  ARGUMENT  OF  KEYBOARD  COMMAND. 

LIKE  LIKELIHOOD  <ABBBREVIATION) 

MAXL  WDA  ELEMENT  PEAK  LIKELIHOOD  FOUND  FOR  CURPAT  SO  FAR. 

MEANS  PATTERN  PARAMETER  LIST  ELEMENT:  POINTER  TO  STflRT  OF 

MEAN  VALUE  STATISTICS  FOR  THIS  PATTERN. 

MOD  MODULO:  X MODULO  N = REMAINDER  OF  X/N 

MVALUE  MEAN  VALUE  OF  DETECTED  WORD'S  PATTERN  PEAK  TIMES. 

NC  NUMERICAL  ARGUMENT  OF  KEYBOARD  COMMAND 

NPFLO  NON-PEAL  TIME  WORD  DETECTION  FLAG,  SET  IF  SPN4NR  EXECUTING 

NTHWRD  BYTE  OPr<^ET  IN  IWL.RN  TO  PTP  POINTING  TO  WOPDS  APPAV 

OF  CUPRENTLV  SOUGHT  BYTE  OFFSET  TO  SVMEXM-  IN  IWDSVM 

NWORDP  f'L‘:ie:ER  OF  WORDS  TO  BE  SOUGHT  (AT  PRESENT  A MAXIMUM  OF  2>. 

V'.TPAT  PATTERN  PARAMETER  LIST  ELEMENT:  PTR  TO  PAT.  SUCCEEDING  CURPAT 

PAL  PDP-11  ASSEMBLY  LANGUAGE 

PASTPT  WDA  ELEMENT  FLAG  SET  IF  CURRENT  PATTERN  SOUGHT  HAS  CROSSED 
IMFLIHOOC'  THRESHOLD.  AND  IS  BEING  TRRCt  ED  FOR  PEAK 
PATTERN  BASIC  SOUNC'  UNIT.  COMPOSED  OP  THREE  SPECTRRL  FBhMES 
EOUmLI  V SPRCEr'  IN  SUB.TECTIVE  TIME 
PPOGBRM  COUNTER.  I E . RC  POR  PDP-11 
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PC 


POP 


PROGRRM  0«Tft  PROCESSOR—  DEC  MhCHINE 


PK 

POINTER 

PRT2LF 

PTMftR 

PTR 

QRDRS 

R0 

R1 

R2 

R3 

R4 

R5 

R<^ 

R7 

SAVE 

SETPTR 

SP 

SPIN3M 

SPIN4M 

SPN4riP 

STRRT 

STDS 

SUBJ, 

SUMPPT 

SUP 

T 

TO 

T1 

I 

THIRD 

THR 

THRESH 


PEAK  <fiBBREVIHTION> 

H POINTER  TO  X IS  R WORD  CONTAINING  THE  ADDRESS  OF  X 

FLAG  SET  <NOT.  = 0>  IF  NON-REAL  TIME  LIKELIHOOD  ROUTINE  RUNNING 

WDM  ELEMENT  POINTER  TO  ARRAV  OF  PROSODIC  TIMING  PARAMETER 
MAXIMUM  MNC-  MINIMUM  VALUES. 

POINTER  CABEBREVIATION:) 

DESTINATION  OF  LIKELIHOOD  CALCULATED  BV  IPART2  FOR 
EACH  FRAME.  USED  TO  PASS  LIKE.  TO  FORTRAN. 

REGISTER  0 

REGISTER  1 

REGISTER  2 

REGISTER  3 

REGISTER  4 

REGISTER  5 

REGISTER  6 

REGISTER  7 

SUBROUTINE  TO  SAVE  CONTENTS  OF  REGISTERS  0 THRO  3. 

SUBROUTINE  CALLED  BV  FORTRAN  TO  SET  TARGET  WORDS  AND  SET 
POINTERS  TO  THEIR  STATISTICS  IN  THEIR  WORD  DESCRIPTOR  ARRAVS. 
STACK  POINTER.  IE..  R6  FOR  PDP-11 

SPEECH  INTERPRETER  3.  MULTIPLE  WORD  SPOTTING  (ENTIRE  PROGRAM >. 

SPEECH  INTERPRETER  4.  MULTIPLE  WORD  SPOTTING.  REAL  TIME 
PAL  SUBPOUT  UJE 

NON-REAL  TIME  WORD  SPOTTING  SUBROUTINE.  SEEKS  ONE  TARGET 
WORD  IN  DATA  SAVED  IN  BIJFFEPS  FROM  REAL  TIME  RUN. 

BUFFER  INITIALIZATION  SUBROUTINE  (DOES  NOT  CHANGE  TARGET  WORDS). 

PATTERN  PARAMETER  LIST  ELEMENT  POINTER  TO  START  OF 
STANC'APD  DEVIATION  STATISTICS  FOR  THIS  PATTERN 
SUB.IECT  I VE  (ABBBREVI  AT  ION) 


WDA  ELEMEfJT  Ni.iMBER  OF  PATTERNS  IN  WORD  PATTERN  SEQUENCE 
DETECTED  SO  FAR 
CONSOLE  SWITCH  REGISTER 

WDA  ELEMErJT  ANAIVSIS  TIME.  TINE  CIN  PEAL  TIME  UNITS'  OF 
FRAME  ACTUALl  BEING  ANALYZED  TC-2SiS  < T < TC 

WDA  ELEf’lr-T,  EXP  I PAT  I MN  TIME  OF  PEAK  TRACKING  FOR  THIS  PATTERN 
TO  - ' :k  of  first  THRESH  CROSSING-  + TRKTIM 

■ ELEMENT  TP+WINDOW,  EXPIRATION  TINE  OF  SEFRCH  FOR  CURRENT 
PATTEFri  SOl.iGHT 

CURRENT  .REAL’  TIME.  INCREMENTED  1 UNIT  = 10  MS  FOR  EVERY  NEW 
AUTOCORRELATION  INPUT  FRAME  ABOVE  AMPLITUDE  THP.ESHHOLD. 

NEXTFP  LOTiR  COUNTER.  INCREMENTED  WITH  EACH  LOCP 

FROM  NEXTFP.  AFTER  THIPf  LOOP.  GO  GET  NEW  INPUT  FRAME 

AMRLITUC'E  THRESHOLD.  BELOW  WHICH  INPUT  FRAME  IS  IGNORED 

PATTERN  RAPAMFTFP  LIST  ELFMENT  INDICATES  LIKELIHOOD 
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TIMEP 

TOTIME 

TP 

TPKTIM 

UNSHVE 

VBft 

V C 

V P 
VPG 
UD 
UOfi 

HOfIPTR 
WDLMIN 
WC-PCNT 
WOSTPT 
UJ I NDOU 
W/ 


THPESHOLC*  FOR  THFtT  FBTTEPri 

WOFt  ELEMErJT  iTP  OF  PFiT  1 •+1-40LMIH  = EARLIEST  FiCCEPTFiBLE  TIME 
FOR  TOTFIL  WOFO  EMCv 

TOTFIL  PEFiL  TIME  DURATION  OF  DETECTED  WORD 

WDR  ELEMENT  TINE  OF  CUPPfiT  PEFtK  LIKELIHOOD 

LENGTH  OF  INTERVFIL  FOR  WHICH  EfiCH  PATTERN  LIKELIHOOD  PEAK 
IS  TRACKED. 

SUBROUTINE  TO  RESTORE  PREVIOUSLV  SAVED  VALUES  OF  REGISTERS 
0-5 

VECTOR  COMPUTER  BUS  ADDRESS  REGISTER 
VECTOR  COMPUTER  CABBBREVI ATION) 

VECTOR  PROCESSOR  CABBBREVIATION) 

VECTOR  COMPUTER  PROGRAM . COUNTER  REGISTER 
WORD  (ABBREVIATION) 

WORD  DESCRIPTOR  ARRAV.  APRAV  OF  REFERENCE  AND  STATUS  INFORMATION 
CONCERNING  ONE  OF  THE  TARGET  WORDS.  SEE  DATA  STRUCTURES  SECTION. 
POINTER  TO  WORD  DESCRIPTOR  ARRAV  OF  WORD  CURRENTLV  SOUGHT. 

WDA  ELEMENT:  MINIMUM  WORD  LENGTH  GIVEN  IN  # OF  FRAMES. 

WDA  ELEMENT  # OF  PATTERN  DETECTIONS  COMPRISING  A WORD 
DETECTION. 

WDA  ELEMENT;  FLAG  SET  IF  FIRST  PATTERN  HAS  BEEN  FOUND 
FOR  WORD  SOUGHT..  "WORD  STARTED" 

PATTERN  PARAMETER  LIST  ELEMENT;  # OF  FRAMES  TO  END  OF 
DETECTION  WINDOW  FOR  NEXT  PATTERN. 

WITH  (ABBREVIATION) 
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I ntroduction 


The  dialog  vector  computer  is  a high-speed  single 
cycle  processor  designed  for  rapid  vector  arithmetic. 

The  vector  computer  consists  of  several  digital  arithmetic 
units  and  data  memories  connected  to  each  other  via  three 
32-bit  data  buses.  The  PDP-11  host  computer  exerts 
primary  control  over  the  computer  via  four  device  registers 
on  the  PDP-11  unibus.  The  first  of  these  is  the  vector 
computer  "PC"  register  which  is  used  to  specify  the  starting 
address  of  the  vector  computer  program.  Modification  of 
this  register  places  the  vector  computer  in  "run"  state. 

The  second  register  is  the  unibus  address  register  which 
the  vector  computer  uses  when  transferring  data  to  or  from 
the  host  computer's  memory.  The  third  register  is  the 
program  "LOAD"  register.  This  register  is  used  to  load 
,the  vector  computer  program  memory  from  the  host  computer. 
This  operation  is  the  only  \'fay  to  place  data  in  the  vector 
computer  program  memory.  The  fourth  register  is  used  for 
hardware  testing. 

Vector  Computer  Unibus  Address: 

PC  register  - 167730 

Unibus  address  register  - 167732 

Program  load  register  - 167734 

Diagnostic  register  - 167736 


The  cross-assembler  named  XASM  is  used  to  create 


binary  program  images  suitable  for  loading  in  the  vector 
computer  via  the  program  load  register.  XASM  takes 
source  text  using  syntax  similar  to  the  PAL-11  assembler 
and  outputs  an  object  modulo  compatible  with  the  system 
linker.  The  object  module  is  given  a global  name  so 
tliat  the  programmer  may  load  the  module  with  a single 
"MOV"  instruction. 


.GLOBL  NAME 

MOV  #NAiME,0#167734  ; loads  module  called  "NAME" 

In  essence,  XASM  simply  provides  symbolic  names  for 
bit  patterns  representing  XASM  instructions  and  a syntax 
for  specifying  bit  settings  within  the  vector  computer 
instruction.  XASM  permits  the  programmer  to  specify 
instruction  settings  as  octal,  decimal,  or  binary  numbers, 
or  by  using  built-in  or  user-defined  syn±>ols.  XASM  iias 
large  set  of  standard  symbolic  names  for  computer  devices 
and  operations. 

Vector  Processor  Instruction  Classes 


The  vector  processor  recognizes  four  different 
operation  c?odes  . They  are: 


00 
01 
1 0 
1 1 


Arithmetic- logic  instructions 
Data  class  instructions 
Bus  transfer  instructions 
Program  branch  instructions 


(ALU) 

(MISCELLANEOUS) 

(BUS) 

(BC) 


Each  instruction  takes  one  machine  cycle  (120  nsec) 
to  execute.  Each  instruction  has  four  common  bits  in 
addition  to  the  op-code.  Throe  of  these  bits  are  used  to 
enable  the  three  data  buses  for  transfers  during  the 
instruction  execution  (if  desired).  The  fourth  bit  is  the 
"REPEAT"  bit.  If  set,  the  instruction  will  be  repeated 
for  the  number  of  times  stored  in  the  "REPEAT  COUNTICR" . 

These  bits  may  be  sot  by  using  the  ".COM"  instruction. 

The  rest  of  the  32-bit  instruction  is  used  to  specify  the 
exact  function.  Arithmetic  (ALU)  instructions  cause  the 
ALU  to  perform  some  single-cycle  arithmetic  functions  using 
the  contents  of  the  ALU  registers  (The  "AR" , "BR" , and  "FR"). 
The  bus  transfer  instructions  are  used  to  set  up  the  three 
processor  buses  ("A",  "B" , and  "D")  to  transfer  between 
specified  processor  registers  or  devices.  A "BUS" 
instruction  will  actually  perform  the  transfer  if  the 
"BUS  ENABLE"  common  bits  are  set  for  the  buses  involved. 

The  branch  instructions  are  used  to  conditionally  branch 
in  a vector  computer  program  and  optionally  save  a return 
address.  T'ne  branch  address  may  be  part  of  tlie  instruction, 
tlie  previously  saved  return  address,  or  may  come  from  the 
processor  D-bus . Tlie  data  class  instructions  are  generally 
processor  control  instructions. 
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Data  bus  structure;  the  BUS  class  instruction 


There  are  three  principal  data  buses,  named  A, 

B,  and  D.  Each  bus  is  32  bits  wide.  During  an  instruction 
cycle,  which  lasts  120  nanoseconds,  there  may  be  at 
most  one  source  and  one  destination  for  data  on  each 
bus.  Each  source  and  each  destination  has  a four-bit 
address  code  which  is  set  up  bv  executing  a BUS  class 
instruction.  The  address  code  is  transmitted  on  an  independent 
address  bus.  For  example,  the  "B  memory"  scratchpad  is 
a possible  source  for  the  B bus  and  the  "B  register"  input 
to  the  arithmetic  unit  is  a possible  destination  for  the 
B bus.  The  BUS  instruction  has  a total  of  six  address 
fields,  which  must  be  specified  in  a particular  order 
when  writing  a BUS  instruction  in  the  assembly  language. 

The  protocol  is 

BUS  A bus  source,  A bus  destination, 

B bus  source,  B bus  destination, 

D bus  source,  O bus  destination 

Each  data  source  or  destination  has  a mnemonic  assembly 
language  code  which  may  be  entered  at  the  appropriate 
spot  in  a BUS  class  assembly  language  instruction.  The 
mnemonics  appear  as  port  labels  in  Figure  11. 
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The  sources  and  destinations  stipulated  in  a BUS 
instruction  remain  active  until  superseded  by  subsequently 
executing  another  BUS  instruction.  However,  no  data  is 
actually  transmitted  or  received  on  any  data  bus  unless 
the  bus  has  been  enabled  explicitly  by  setting  its  associated 
bus  enable  hit  in  the  program  instruction  word.  With  one 
exception  explained  in  the  next  section,  any  bus  may  be 
enabled  on  any  instruction,  and  data  transfer  will  take 
place  as  specified  in  the  most  recently  executed  BUS 
instruction.  If  a bus  is  enabled  during  a BUS  instruction, 
data  transfer  will  take  place  in  accordance  with  the 
specif ications  in  that  same  instruction.  The  bus  enable 
bits  are  bit  3 for  the  A bus,  bit  4 for  the  B bus,  and 
bit  5 for  the  D bus.  In  the  assembly  language  any  desired 
combination  of  buses  may  be  enabled  by  writing 

.COM  Arg^ , Arg2 , • • • 

on  the  line  immediately  followina  the  instruction  on  which 
the  bus  or  buses  are  to  be  enabled.  The  assembly  language 
arguments  Arg^  are 

AE  to  enable  the  A bus 
BE  to  enable  the  B bus 
DE  to  enable  the  D bus 
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I 


I 

The  only  other  legal  argument  to  .COM  is  RE,  which 
is  explained  in  the  section  on  the  do-loop  and  repeat  counters. 

lable  IV  gives  a complete  listing  of  legal  bus  source  and 
destination  mnemonics. 


The  following  characteristics  of  the  processor 
hardware  must  be  understood  for  proper  programming  results. 
All  operations  in  the  processor  are  driven  by  a master  clock 
whose  period  is  0.12  microsecond.  All  bus  sources  are 
driven  by  tri-state  (high,  low,  off)  drivers,  and  all 
destinations  receive  data  via  clocked  latches.  Thus  at 
the  beginning  of  each  master  clock  cycle,  data  is  gated 
onto  all  enabled  buses,  but  no  data  is  received  at  any 
bus  destination  until  the  end  of  that  cycle  (i.e.,  the 
beginning  of  the  next  cycle).  All  devices  (e.g.,  memories, 
arithmetic  unit,  ...)  connected  to  the  buses  have 
characteristic  propagation  delay  times  which  must  be 
honored  to  ensure  that  their  outputs  are  valid  at  the 
time  thev  are  actually  received  at  the  desired  destination. 
For  example,  in  most  operations  the  arithmetic  unit  requires 
one  clock  cycle  for  its  outputs  to  settle.  Thus  to  add 
X and  Y,  the  programmer  may  cause  X to  be  written  into  the 
A register  and  Y to  be  written  into  the  B register  on 
completion  of  instruction  n,  but  the  sum  X+Y  will  not  be 
valid  at  any  enabled  destination  until  one  clock  cycle  later, 
at  the  end  of  the  next  following  instruction  ?z+l.  In  array 
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operations  this  means  that  fastest  execution  speed  is 
obtained  by  filling  a short  "pipeline"  consisting  of  a 
chain  of  registers  before  entering  a repetitive  software 
loop  to  step  through  the  array. 

Because  the  processor  is  designed  especially  for 
fast  execution  of  array  operations,  each  data  memory 
is  accessed  via  its  own  address  counter  which  may  be 
set  to  increment  automatically  whenever  the  memory  is 
referenced  on  an  enabled  bus.  The  address  counter  is 
treated  as  a separate  device  by  the  BUS  instruction. 

For  example,  to  access  address  a in  the  B memory,  one 
first  deposits  the  number  a in  the  B memory  address 
counter,  which  is  a destination  on  the  A bus.  The 
contents  of  B memory  location  a will  become  valid 
and  available  for  transfer  by  the  end  of  the  next 
following  instruction.  Alternatively,  data  may  be 
written  into  the  B memory  at  location  a on  the  same 
instruction  which  loaded  the  address  counter;  however  the 
actual  write  cycle  occurs  while  the  next  following 
instruction  is  being  executed,  so  it  is  not  possible 
to  read  from  the  memory  until  the  second  instruction 
following  a write  instruction.  A succession  of  read 
operations  or  write  operations  may  be  executed  on 
contiguous  instructions,  but  the  A and  B scratchpad 
memories  must  be  given  one  bus  cycle  to  recover  when 
switching  from  writing  to  reading;  this  recovery  cycle 
may  be  used  to  load  a new  address  into  the  address  counter. 
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r — ^ 

f 

without  disturbing  the  write  operation,  since  the  address 
information  is  not  actually  changed  until  the  end  of  the 

instruction.  ] 
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TABLE  IV.  BUS 


BUS  CLASS  INSTRUCTION  CODES 


FORMAT : 


BUS:  A-Source,  A-Destination , B-Source,  B-Destination 
D-Source,  D-Destination 

This  instruction  sets  all  bus  addresses  for  all 
processor  buses. 

A destination  uses  .word  BOOOOllllOOOOOOOO 

as  a mask  with  these  operands: 


AR  - A-register  in  ALU  17 
RC  - Repeat  counter  16 
SC  - Shift  counter  15 
LI  - Loop  counter  1 12 
L2  - Loop  counter  2 14 
DMC  - D-memory  control  register  10 
DMA  - D-memory  address  register  7 
BMA  - B-memory  address  register  6 
AT-IA  - A-momory  address  counter  5 
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A-  Source  uses  .word  0,  + BllllOOOOOOOOOOOO  as  a mask  with  these 


operands ; 


- A-memory  output 

17 

AMI 

- A-memory  output  w/auto  increment 

16 

DBUS 

- Low  16  bits  of  D bus  connected  to 

high  16 

bits 

15 

of  A bus 

NORM 

- Normalizer  output 

14 

MUL 

- Multiplier  output 

13 

I 

- data  from  instruction 

1 

Source  uses  . word  t BOOOOOOOOOOOOllll , 0 as 

a mask 

with 

these 

operands : 

BM 

- B-memory  output 

17 

BMI 

- B-memory  output,  auto  incremented 

16 

SH 

- Shift  output 

15 

DBUS 

- Low  16  bits  of  D bus  connectcKl  to 

low  16 

bits 

14 

of  B bus 

Dest 

ination  uses  .word  1 BOOOOOOOOllllOOOO 

, 0 as  a 

mask 

with 

operands : 

BR 

- B-register  in  ALU 

17 

MUL 

- Multiplier  input 

16 

L107 


D-  Source  uses  .word  tBOOOOllllOOOOOOOO , 0 as  a mask  with  these 
operands : 


PDF  - PDP-11  data  bus  latch  17 
ALU  - output  of  ALU  16 
COR  - Auto  correlator  output  14 
DM  - D-memory  output  6 


D-  Destination  uses  .word  tBlll 1000000000000 , 0 as  a mask  with 
these  operands: 


PDP  - PDP-11  data  bus  latch  17 
BM  - B-memory  input  16 
AM  - A-memory  input  15 
BMI  - B-memory  auto  increment  14 
AMI  - A-memory  auto  increment  13 
SH  - Shifter  input  -12 
MUL  - Multiplier  input  11 
COR  - Auto  correlator  input  10 
PDPA  - PDP  bus  address  register  7 
DM  - D-memory  input  6 
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Figure  12  (conf)  Arithmetic  control  signals 
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Arithmetic  instruction 


The  arithmetic  unit  has  a 32-bit  input  to  the  A register 
from  the  A bus,  a 32-bit  input  to  the  B register  from  the 
B bus,  and  a 32-bit  output  which  drives  the  D bus.  The 
corresponding  BUS  instruction  mnemonics  are  DAREG  for  the 
A input,  DBRFG  for  the  B input,  and  SARITH  for  the  D bus 
data  output.  The  function  to  be  performed  by  the  arithmetic 
unit  is  determined  by  executing  an  ALU  class  instruction; 
thereafter,  the  arithmetic  unit  will  continue  to  perform 
the  same  function  (with  certain  exceptions  listed  below) 
until  the  function  is  changed  by  executing  another  ALU  class 
instruction . 

I The  ALU  instruction  in  the  assembly  language  contains 

j four  specification  fields  which  identify  the  arithmetic  or 

I logical  function  to  be  performed,  the  A input  source  and 

I A register  operation,  the  B input  source  and  B register 

noeration,  and  the  F register  source  and  its  operation. 

The  flovr  of  data  is  illustrated  and  related  to  the  ALU 
instruction  bits  in  Figure  12. 

The  A and  B registers  are  four-function  bidirectional 
shift  registers  which  can  be  controlled  independently  to 
perform  arithmetic  shift  up,  arithmetic  shift  down,  hold, 
i or  load  operations.  On  an  ALU  instruction,  the  A or  B 

I register  will  not  be  loaded  from  the  A or  B bus  unless  the 

I 
1 

L 
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instruction  carries  the  "load"  code  (mnemonic  LD)  in  the 
field  for  the  register  to  be  loaded.  Note  also  that  the 
data  loaded  on  an  ALU  instruction  will  not  be  valid  unless 


the  appropriate  data  buses  are  enabled  by  the  assembly 
language  .COM  directive  appended  to  the  instruction. 

On  load  or  hold  operations,  all  32  bits  are  affected. 

For  shift  operations  in  the  A register,  all  32  bits  are 
affected.  For  shifts  in  the  B register,  all  32  bits  are 
affected  unless  the  "divide"  bit  is  set,  in  which  case  the 
bit  shifted  up  from  bit  15  is  lost,  and  a quotient  bit  is 
shifted  up  into  bit  16.  In  the  assembly  language,  the  "hold" 
mode  is  understood  unless  LD  (load) , SHUP  (shift  up) , or 
SHDN  (shift  down)  is  written  in  the  A or  B register  field. 

The  F register  is  a 32-bit  clocked  latch  which  may 
receive  the  output  of  the  arithmetic  logic  element  with  an 
arithmetic  shift  of  +1,  0,  or  -1  or  its  own  output  arithmetically 
shifted  up  one.  The  top  16  bits  (16  - 31)  and  the  bottom  bits 
(0  - 15)  are  latched  by  independent  instruction  bits  whose 
mnemonics  are  FENL  for  the  low  bits  and  FENH  for  the  high  bits. 
The  state  of  the  F register  cannot  change  unless  the  current 
instruction  is  an  ALU  instruction  and  one  or  both  of  the  F 
register  enable  bits  are  set. 

Note  in  Figure  12  that  the  F register  cannot  be  accessed 
directly  from  any  of  the  data  buses.  Information  received  at 
the  inputs  of  the  F register  and  arithmetic  logic  element 
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Ex- 


is  controlled  by  three  four-position  multiplexers, 
cept  for  one  of  the  B inputs  to  the  arithmetic  logic  ele- 
ment, all  32  data  bits  are  affected  similarly  by  the  mul- 
tiplexer settings.  The  four  settings  for  the  input  to  the 
F register  have  already  been  citied.  The  mnemonics  are 
AL  for  the  output  of  the  arithmetic  logic  element  (abbre- 
viated ALU,  not  meaning  the  instruction  class),  ALUP  for 
the  output  of  the  ALU  shifted  up  1 bit,  ALDN  for  the  out- 
put of  the  ALU  shifted  down  1 bit,  and  FLUP  (or  0)  for 
the  output  of  the  F latch  shifted  up  1.  These  four 
states  are  encoded  in  bits  30  and  31  of  the  ALU  class 
instruction . 

The  ALU  has  two  inputs,  designated  A and  B.  The 
A input  can  be  switched  to  the  output  of  the  A register 
(mnemonic  ASA) , the  B register  (ASB) , the  output  of  the 
F register  (ASF) , or  the  output  of  the  F register  shifted 
up  1 bit  (ASFUP  or  0) . These  settings  are  controlled  by 
bits  26  and  27  of  the  ALU  instruction.  The  B input  to 
the  ALU  can  be  the  B register  output  (BSB) , the  A register 
output  (BSA) , or  the  F register  with  no  shift  (BSF) . The 
fourth  possible  R input  (mnemonic  BSFUN  or  0)  treats  the 
high  bits  16  through  31  differently  from  the  low  bits  0 
through  15,  the  high  16  bits  received  are  from  the  B reg- 
ister shifted  up  two  bits  (used  for  computing  square  roots) , 
while  the  low  16  bits  are  received  from  the  high  16  bits  of 
tlio  F register  (effectively  a "shift  down  16  bits"  command)  . 
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The  latter  capability  is  useful  when  splitting  the  32-bit 
processor  word  into  two  16-bit  words  to  be  transmitted 
sequentially  to  the  host  computer. 


The  output  of  the  arithmetic  logic  element  is  gated 
onto  the  enabled  D bus  whenever  the  arithmetic  unit  is 
specified  as  source  device.  This  output  reflects  the  function 
specified  by  the  most  recent  ALU  class  instruction  operating 
on  the  F latch  as  then  loaded  and  the  data  most  recently 
loaded  into  the  A and  B registers.  The  contents  of  the 
F,  A and  B registers  are  retained,  even  if  the  processor 
is  not  running;  but  the  ALU  function  is  cleared  to  zero 
whenever  the  processor  is  halted. 
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ALU 


ARITHMETIC  CLASS  INSTRUCTION 


FORMAT : 

ALU  Function,  A-Function,  B-Function,  Output 

This  instruction  perforins  arithmetic-logic  operations 
and  control  functions  on  the  A,  B,  and  F registers  in  the  ALU. 

"Function"  uses  these  operands: 

ADDL  add  low  16  bits  of  A and  B inputs 

ADDH  add  high  16  bits  of  A and  B inputs  (bits  16-31) 

DADD  add  bits  0-31  of  A and  B inputs 

SUBL  subtract  bits  0-16  of  A and  B inputs  (A  minus  B) 

SUBH  subtract  bits  16-31  of  A and  B inputs  (A  minus  B) 
DSUB  subtract  bits  0-31  of  A and  B inputs  (A  minus  B) 

ANDH  logical  "and"  of  high  16  bits 

ANDL  logical  "and"  of  low  16  bits 

DAND  logical  "and"  of  all  32  bits  (A  and  B) 

ORH  logical  "or"  of  high  bits 

ORL  logical  "or"  of  low  bits 

DOR  logical  "or"  of  all  32  bits  (A  or  B) 

CLAREG  clears  the  A register  at  start  of  instruction  cycle 
and  sets  up  absolute  value  data  routing  for  the 
quantity  in  the  B register 

DRREC  enables  carry  from  bit  15  to  bit  16  ( redundant 
if  32  bit  arithmetic  is  otherwise  specified) 

DIVIDE  single  divide  step  with  32  bit  subtraction 
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A-FUNCTION : 


1 


LD  - Load  from  A-bus 

SHUP  - Shift  contents  up  1 bit 

SHDN  - Shift  contents  down  1 bit 

SA  - A-input  from  A-register 

SB  - A-input  from  B-register 

SFUP  - A-input  from  F-register,  shifted  up 

SF  - A-input  from  F-register 


B-FUNCTION: 

LD  - Load  B-reg  from  B-bus 
SHUP  - Shift  B-contents  up  1 bit 
SHDN  - Shift  B-contents  down  1 bit 

SA  - B-input  from  A-register 

SB  - B-input  from  B-register 

SF  - B-input  from  F-register 

SFUN  - B-input  = hi  16  bits  from  hi-B-latch  shifted  up  2 

lo  16  bits  from  hi-F-latch 

F-FUNCTION  OR  OUTPUT: 


cr,R 

LD 

LDH 

LDL 

AL 

FLUP 

ALUP 
AT, DM 


Clear  output 
Load  F- rcg  i s te r 

Load  only  upper  16  bits  of  F-reg 
Load  only  lower  16  bits  of  F-reg 
F- input  comes  from  ALU 
F-input  comes  from  F-reg  shifted  up  1 

F- input  comes  from  ALU  shifted 
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Branch  class  (jump)  instructions 


The  fact  that  the  next  instruction  to  be  executed 
is  a branch  class  instruction  is  decoded  during  the  bus 
cycle  preceding  the  branch.  If  the  branch  condition  is 
satisfied  at  the  end  of  that  cycle  (i.e.,  at  the  beginning 
of  the  branch  instruction  itself) , then  the  branch  destination 
address  is  loaded  into  the  program  counter.  The  states  of 
the  branch  conditions  tested  are  not  necessarily  held  over 
for  subsequent  branch  tests,  so  the  programmer  must  take 
care  that  an  expected  condition  holds  as  of  the  beginning 
of  the  branch  instruction  of  interest. 

Execution  of  a branch  entails  loading  the  program 

counter  with  an  address  which  may  come  from  one  of  four  i 

! 

sources:  1)  bits  22-31  of  the  branch  instruction  itself, 

2)  the  return-f rom-subroutine  (RTS)  register,  3)  the 
D bus  (which  must  be  enabled  on  the  prior  instruction,  or 
4)  the  PDP-11  Unibus  data  lines,  bits  0-9.  One  of 
these  four  program  address  sources  must  be  specified  in 
bits  20-21  of  the  branch  instruction. 

Bits  8 through  18  of  the  branch  class  instruction 
represent  individual  condition  tests  when  set.  If  any  one 
or  more  of  the  tested  conditions  is  true,  the  branch  will 
be  executed.  Otherwise  the  program  counter  will  increment 
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as  usual  to  fetch  the  next  instruction  in  sequence.  The 
condition  tested  by  the  "unconditional  branch"  bit  #9 
is  always  true. 

Bit  19  of  the  instruction  signifies  a "jump  to 
subroutine"  operation.  When  this  bit  is  set,  the  current 
address  plus  1 is  loaded  into  the  RTS  register,  regardless 
of  the  results  of  the  condition  test.  The  branch  is  executed 
only  if  some  tested  condition  is  true.  Thus  one  can  program 
"jump  to  subroutine  at  address  a if  the  ALU  is  negative" 
and  other  conditional  subroutine  executions. 

A return  from  subroutine  is  accomplished  by  executing 
a branch  instruction  with  the  program  counter  address  source 
being  the  RTS  register.  Again,  some  tested  branch  condition 
must  be  true  in  order  for  the  return  to  occur.  Note  that 
since  the  RTS  register  can  hold  only  one  number,  subroutines 
cannot  be  nested  conveniently. 

The  branch  instruction  takes  one  bus  clock  cycle 
to  execute,  whether  or  not  the  branch  occurs.  The  branch 
destination  may  be  any  legal  program  memory  address. 
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Do-loop  and  repeat  counters 


There  are  three  autodecrement  registers  which  are  used 
as  counters  in  software  loops.  Two  of  the  counters,  named 
LOOPl  and  L00P2  in  the  assembly  language,  are  tested  and 
decremented  by  executing  conditional  branch  instructions. 

The  third  counter,  called  the  repeat  counter,  is  used  for 
one-instruction  loops  — that  is,  to  repeat  an  instruction 
a specified  number  of  times. 

To  use  a LOOP  counter,  the  counter  is  ordinarily  loaded 
with  a number  prior  to  entering  the  software  loop  by  using 
the  LLC  instruction.  The  number  loaded  should  be  one  less 
than  the  number  of  times  the  code  in  the  loop  is  to  be 
executed.  A conditional  branch  instruction  (mnemonic  BRC) 
with  the  name  of  the  LOOP  counter  in  the  test  field  is 
used  to  control  the  loop.  This  instruction  tests  the  current 
value  stored  in  the  counter.  If  it  is  not  zero,  the  branch 
is  executed  and  the  counter  is  decremented  (by  1) . If  it 
is  zero,  no  branch  occurs, the  next  following  instruction  is 
executed,  and  the  counter  is  not  decremented. 

The  repeat  counter  contains  a buffer  register  which 
indefinitely  retains  the  repeat  count  loaded  into  it  by 
the  LRC  instruction.  The  repeat  function  is  activated  by 
setting  bit  2 of  the  instruction  word.  In  the  assembly 
language  this  is  accomplished  by  writing  .COM  RE  on  the 
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line  immediately  following  the  instruction  that  is  to  be 
repeated.  Whenever  the  repeat  bit  is  set,  the  current 
contents  n of  the  repeat  count  register  are  loaded  into 
the  repeat  counter  and  the  instruction  is  then  executed 
n+1  times. 
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nc  BRANCH  CLASS  INSTRUCTION 

FOR^IAT: 

BC  Type,  Condition,  Address 

This  instruction  permits  specification  of  any  possible 
branch  instruction: 

TYPE-  uses  .word  tBOOOOOOOOOOlllOOO , 0 as  a mask  with  these 
operands : 

JSR  - Load  return  address  1 

I - Take  branch  address  from  instruction  0 

DBUS  - Take  branch  address  from  D bus  2 

RR  - Take  branch  address  from  return  address  reg  4 

CONDITION  - uses  .word  tBlll,  t BllllllllOOOOOOOO  and  these 
operands : 


LLT 

- Low  ALU  less  than  0 

4 

LGE 

- Low  ALU  not  less  than  0 

10 

HLT 

- Hi  ALU  negative 

40 

HGE 

- Hi  ALU  positive  or  0 

20 

LI 

- Decrement  and  test  loop 

1 

1 

L2 

- Decrement  and  test  loop 

2 

1000 

CORNR 

- Auto  correlator  not  ready 

2000 

ALWAYS 

- Unconditional 

2 

121 


BC  - cont'd 

ADDRF.SS  - uses  .word  f Rlllllllll  1000000 , 0 Any  predefined  or 
user  defined  symbol  or  valid  numl^er  may  appear  here, 
not  including  special  symbols. 
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BRANCH  ON  CONDITION 


INSTRUCTION  FORMAT: 

BRC  Test,  addr 

Where  addr  is  any  valid  address  in  the  program  memory.  The 
first  operand,  test  is  a mask  specifying  the  branch  conditions. 
The  test  may  be  specified  by  one  (or  the  sum  of  more  than  one) 
of  the  following  special  symbols: 

LLT  - Low  ALU  is  less  than  zero 
LGE  - Low  Alu  is  not  less  than  zero 
HLT  - High  Alu  is  less  than  zero 
HGE  - High  ALU  is  not  less  than  zero 

CORNR-  Auto  correlator  not  ready 

Ll-r  ■,  decrement  loop  counter 
L2-^  ^ and  branch  if  zero 
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Additional  information  on  the  architecture  of  the 


r 


vector  processor  is  contained  in  the  accompanying  diagrams. 
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IV  Appendix 

Tables  of  Lower  Confidence  Limits 
I for  the  Binomial  Distribution. 

: Suppose  there  are  K successes  in  a sequence  of 

i 

I 

N Bernoulli  trials.  For  these  variables  and  a chosen 
confidence  level  C the  tables  give  a lower  bound  P on 
the  true  probability  of  success  per  trial.  If  the 
true  probability  were  lower  than  P,  then  the  probability 
of  getting  K or  more  successes  in  N trials  would  be  less 
than  1-C. 

1 

Example:  An  experiment  yields  49  successes  out 

of  50  independent  trials.  From  the  table  for  N = 50, 

K = 49,  C = 0.9,  find  that  the  90%  lower  confidence 
limit  on  the  true  probability  is  P = 0.9244. 
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COrJFIDENCF  LEVELS  FOP  PlfJOMIRL  DISTRIBUTION 
P SUCH  THhT  PR0B[X<K  given  P3  = C 

N=  5 


C-= 

0.  99 

0.  90 

0.  95 

K 

K/N 

5 

,1,  0000 

0.  S901 

0 45?2 

0,  5492 

0. 

4 

0 0000 

0.  2221 

0.  2671 

0.  3426 

0. 

3 

0.  6000 

0.  1056 

0 1I<5j- 

0,  1992 

0. 
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0.  90 

6309 

416.1 

2466 


1 


CONPIDFNCF  I rVEl.S  FOR  B I NOM I Hi.  0 1 STB  I BUT  I ON 
P SUCH  I HOT  PROBrX<K  GIVEN  P]  = C 

N=  i 0 


r= 

0 99 

0.  98 

0,  95 

0 90 

K 

K/N 

1 0 

1. 

rifiMii 

0. 

6~:09 

0. 

t>7  63 

0 

7411 

0. 

7943 

^rirnjn'i 

0 

495B 

0. 

5398 

0 

605c! 

0. 

6 6 3’ 2 

f J 

*■"1 

8000 

0. 

3003 

0. 

4295 

0. 

4931 

0 

5504 

l' 

*•"1. 

7000 

0 

2971 

0. 

3343 

0 

3934 

0 

4483 

0 

6000 

0. 

2183 

0. 

2507 

0. 

3035 

0 

3542 

0. 

5000 

0. 

1504 

0 

1773 

0. 

2224 

0. 

2673 
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I OEr  Jr 

F-  LFVEL 

S FOP  PINOMIRL 

DI 

STPiei.lTION 

F*  sue 

H THRT 

PPOBfXCK  GIVErj 

p] 

= c 

N=  15 

|“;ss 

0 

0 93 

0.  95 

0 90 

1 

>:/N 

-.15 

■_1  0V"100 

ev 

7357 

0.  7704 

0. 

8190 

0 

8577 

J 4 

0 

tv 

6321 

0.  6683' 

0. 

7206 

0 

7644 

i:5: 

0 !?>”•?  7 

fv 

5468 

0.  5830 

0, 

6366 

0 

6-827 

■1  2 

0 0000 

nv 

4715 

0.  5069 

0 

5i602 

0 

6072 

.1  \ 

0.  7331- 

0. 

4031 

0.  4371 

0. 

4892 

0. 

53:60 

10 

0.  6667 

0. 

3403 

0.  3725 

0 

4226 

0 

4 6 6!  3 

9 

0 0000 

0. 

0.  3123: 

0. 

3596 

0 

4035 

fi 

0 f.SK"; 

0. 

i.’S.’UT' 

0.  2561 

0. 

0. 
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c= 

01  99 

1' 

K.-'N 

.-'p 

1 PPpp 

0 

7943 

19 

p '4SPP 

M 

7112 

18 

P 9PPP 

841  7 

17 

P 85P11 

0. 

5793 

18 

P ;-;PPP 
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